Prediction method and system
By constructing an advertising prediction dataset for inventory and sales and ingesting data from a real-time logical analysis database cluster, and using machine learning models for real-time prediction, the inaccuracy problem of online advertising campaigns is solved, energy consumption and carbon emissions are reduced, and prediction speed and accuracy are improved.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SKYSCANNER TECHNOLOGY LTD
- Filing Date
- 2024-04-23
- Publication Date
- 2026-06-19
AI Technical Summary
Existing methods and systems for predicting web-based advertising campaigns are inaccurate, leading to internet advertising consuming large amounts of energy and generating a significant carbon footprint.
By constructing an advertising-predicted inventory data set and ingesting it from a real-time logical analysis database cluster, machine learning models such as Prophet are used for real-time prediction. This is combined with monitoring resource consumption and generating operational alerts to improve prediction accuracy and reliability.
It reduces the energy consumption and bandwidth requirements of ad displays, improves prediction speed and accuracy, and lowers the energy consumption and carbon emissions of online advertising.
Smart Images

Figure CN122249823A_ABST
Abstract
Description
Technical Field
[0001] The field of this invention relates to prediction methods and systems, particularly for web-based content, and even more particularly for web-based advertising campaigns. Background Technology
[0002] Previous forecasting methods and systems, especially those for web-based advertising campaigns, suffer from inaccuracies. Improvements in forecasting methods and systems are needed, particularly for web-based advertising campaigns. This could improve the efficiency of the internet.
[0003] US2016253709A1 discloses technology for predicting advertising campaigns. A personalized communication system can receive requests for advertising campaigns on a social network. These requests can have member attributes and time ranges. The personalized communication system can access member data and behavioral data from the social network. Furthermore, the personalized communication system can determine a target audience based on member data and member attributes. Additionally, the personalized communication system can calculate the number of unique visitors from the target audience on the social network based on member attributes, time ranges, and frequency limits. Subsequently, the personalized communication system can predict the number of messages for a first advertising campaign based on the calculated number of unique visitors, behavioral data, and time ranges.
[0004] EP1061710 (A2) and EP1061710 (B1) disclose a system for providing access to network objects that matches predicted demand for the network objects with available capacity on a network server. The system implements a method for dynamically adjusting demand and capacity based on certain criteria. The system provides a method for dynamically forming demand for objects based on criteria such as arrival time, access geolocation, and cost requirements. Specifically, it discloses characterizing future demand for such objects based on the aggregation and prediction of past demand.
[0005] Parssinen et al., Environmental Impact Assessment Review, Vol. 73, November 2018, pp. 177-200, concluded that in 2016, online advertising consumed 20-282 TWh of energy; the internet consumes a large amount of energy and generates CO2 emissions with global impact; reducing online advertising traffic will improve the energy efficiency of the internet, and the impact of ineffective online advertising consumes a large amount of energy and generates a large carbon footprint. Summary of the Invention
[0006] According to a first aspect of the present invention, a computer-implemented method is provided for predicting the number of clicks and impressions that a row item will receive within a defined time period, the method comprising the following steps: (i) The prediction service receives a request from the portal, which is used to predict the number of clicks and impressions that a row item will receive within a defined time period. The request includes alignment criteria and positioning data. (ii) The prediction service translates alignment criteria and location data into index query syntax; (iii) The prediction service uses index query syntax to request inventory management via service requests; (iv) The service sends a request to the database system for the index query syntax corresponding to the time series of the historical search; (v) In response to a request, the service receives the time series of historical searches from the database system; (vi) The service provides inventory management services to the prediction service corresponding to the defined time period based on the received historical search time series; (vii) The prediction service uses the returned inventory data in the request to the real-time service. The real-time service predicts the number of clicks and impressions that a row item will receive within a defined time period, wherein the prediction service receives the response from the real-time service in real time, for example, in less than a second. (viii) The prediction service derives the click-through rate (CTR) of a row item using predictions of the number of clicks and impressions that the row item will receive within a defined time period. (ix) The prediction service returns to the portal the predictions of the number of clicks and impressions that a line item will receive within a defined time period, as well as the click-through rate (CTR) within the defined time period.
[0007] The advantages include improved prediction accuracy: this is because fewer row-based item displays are needed to offset the poorer prediction accuracy, which reduces the energy used to send and receive displays and the bandwidth required to send displays. Another advantage is that the prediction service receives responses from a real-time service, which improves prediction speed. Finally, a benefit is that (e.g., sponsored) portals can write advertising campaigns with predictions to (e.g., sponsored) data layers more quickly.
[0008] This method could be one where the historical search involves flights, hotel bookings, or car rentals.
[0009] This method can be a method in which the time series of historical searches is a time series of historical searches at predetermined intervals.
[0010] This method can be a method where the time series of historical searches is a daily time series of historical searches.
[0011] This method can be one in which the prediction service is implemented as or includes AWS Lambda functionality. Advantages include those described above in the first aspect of the invention.
[0012] This method can include creating an inventory projection by training and querying a machine learning (ML) model in real time. The advantages include those described above in the first aspect of the invention.
[0013] This method could include a search alignment criterion that receives row-based items as input.
[0014] The method can be as follows: the prediction of the number of clicks and impressions that a row item will receive within a defined time period includes the predicted time series.
[0015] This method could be one in which the prediction service sends an event that includes the predicted time series, and thus the event can be used for accuracy reporting.
[0016] This method could be one in which a service makes a request to a database system that allows fast queries on very large datasets (such as the Apache Druid database system).
[0017] This method can be used to query a database system containing at least 100,000 rows of data, or at least 1 million rows of data, or at least 10 million rows of data. Advantages include those described above in the first aspect of the invention. One advantage is the ability to query very large datasets in real time.
[0018] This method can be as follows: the real-time service that predicts the number of clicks and impressions a row-based item will receive within a defined time period is, or includes, Facebook's Prophet. Advantages include those described above in the first aspect of this invention. One advantage is the ability to query very large datasets in real time.
[0019] This method can be as follows: Facebook's Prophet provides an interface to a machine learning model that can be trained and, when trained, can predict and display content in less than a second. Advantages include those described above in the first aspect of this invention. One advantage is the ability to query very large datasets in real time.
[0020] This method can be used for projects where the row format is advertising or news.
[0021] This method can be implemented by monitoring resource consumption and generating an operational alert (e.g., a warning message) if resource consumption reaches a predetermined threshold. The advantage is that it provides more reliable predictions.
[0022] This method can be implemented by automatically requesting additional resources in response to when an operational alert is generated. The advantage is that it provides more reliable predictions.
[0023] This method can be implemented by automatically providing additional resources in response to when an operational alert is generated. The advantage is that it provides more reliable predictions.
[0024] This method can be one in which resource consumption includes memory usage or central processing unit (CPU) usage.
[0025] This method can be used to generate operational alerts (e.g., warning messages) when the prediction service returns an error response. The advantage is that it provides more reliable predictions.
[0026] This method can be used to generate operational alerts (e.g., warning messages) when the prediction service is unavailable. The advantage is that it provides more reliable predictions.
[0027] According to a second aspect of the invention, a system configured to perform any aspect of the method of the first aspect of the invention is provided.
[0028] According to a third aspect of the present invention, a computer-implemented method is provided, which constructs an advertising prediction inventory dataset and ingests the advertising prediction inventory dataset from a real-time logical analysis database cluster, the method comprising the following steps: (i) The advertising (e.g., flight) search event builder performs a builder job in the directed acyclic graph (DAG) of advertising reports to build the advertising (e.g., flight) search event dataset until the builder job is complete; (ii) The advertising search event sensor in the advertising inventory DAG (e.g., flight) awaits the completion of the builder job; (iii) The advertising prediction inventory builder in the advertising inventory DAG constructs the prediction inventory dataset and stores the prediction inventory dataset in a format that is used for objects (e.g., prediction buckets). (iv) The advertising forecasting inventory sensor in the advertising forecasting inventory DAG waits for the advertising forecasting inventory builder in the advertising inventory DAG to complete its work. (v) In the advertising prediction inventory DAG, the real-time logic analysis database delivers real-time (e.g., less than one second) responses to queries on streaming and batch data (e.g., Apache Druid). The real-time logic analysis database operator retrieves the inventory dataset stored in a container (e.g., prediction bucket) for objects and then ingests the inventory dataset stored in a format (e.g., Apache Parquet), including it in the real-time logic analysis database, such as the Apache Druid cluster, and indexes it.
[0029] The advantages include improved prediction accuracy: fewer row-based item displays are needed to offset poorer prediction accuracy, reducing the energy used in sending and receiving displays and the bandwidth required to send displays. The prediction service can receive responses from a real-time logical analysis database cluster, improving prediction speed. Furthermore, (e.g., sponsored) portals can write advertising campaigns with predictions to (e.g., sponsored) data layers more quickly.
[0030] This method can be any method including the method of the first aspect of the present invention.
[0031] The method may be a method in which the advertising search event builder is or includes an advertising flight search event builder, or an advertising hotel search event builder, or an advertising car rental search event builder.
[0032] This method can be used to query at least 100,000 rows of data, or at least 1 million rows of data, or at least 10 million rows of data, or at least 100 million rows of data in a real-time logical analysis database cluster. Advantages include those described in the third aspect of this invention. One advantage is the ability to query very large datasets in real time.
[0033] This method could be a logical analysis engine used in advertising prediction inventory builders for large-scale data processing (e.g., Apache Spark) application jobs.
[0034] This method can be a container for objects that are used to predict time periods.
[0035] This method could be one where the prediction bucket is an S3 bucket.
[0036] This method can be as follows: An advertising-predictive inventory management (DAG) (e.g., Apache Druid) DAG is used to run jobs to extract data from different events to construct a daily predicted inventory management dataset. This dataset is ingested into an index of a real-time logical analysis database, which, under large-scale and high-load conditions, such as Apache Druid, delivers real-time (e.g., less than one second) responses to queries on streaming and batch data for subsequent queries. Advantages include those of the third aspect of the invention described above. One advantage is the ability to query very large datasets in real time.
[0037] This method could be used to present products within an inventory dataset on a webpage.
[0038] This method could be used to extract a dataset of advertising (e.g., flight) search events from a pricing service (e.g., FPS) table.
[0039] According to a fourth aspect of the invention, a system is provided that is configured to perform the methods of any aspect of the third aspect of the invention.
[0040] According to a fifth aspect of the present invention, a computer-implemented method for constructing and storing an advertising prediction accuracy report is provided, the method comprising the steps of: (i) Output the historical forecast of advertising line projects to the data layer; (ii) Transfer the row item predictions from the data layer to the ad row item prediction builder in the directed acyclic graph (DAG) of ad prediction accuracy via the web service interface, and construct the ad row item predictions. (iii) A logical analytics engine (e.g., Apache Spark) used for large-scale data processing in the DAG of advertising reports receives targeting criteria for campaign advertising line items; (iv) Based on the number of search hits of the alignment criteria for the campaign ad line items, corresponding to a defined time period, a logical analytics engine (e.g., Apache Spark) for large-scale data processing in the ad report DAG performs a job to build a report on the performance of the ad line items. (v) Advertising prediction accuracy: Sensors in the DAG await advertising reports; Advertising row project performance report builder in the DAG completes the business. (vi) Logical analytics engines for large-scale data processing (such as Apache Spark) use reports on ad row item performance, ad row item prediction, and search hit counts based on alignment criteria of campaign ad row items to perform jobs in the ad prediction accuracy DAG to build and store ad prediction accuracy reports.
[0041] The advantage is that it can improve prediction accuracy: This has the following advantages: fewer row items are needed to offset the poorer prediction accuracy, which reduces the energy used in sending and receiving displays, and reduces the bandwidth required to send displays.
[0042] The method can be any method including the first or third aspect of the present invention.
[0043] This method may be a method in which the network service interface is or includes the S3 interface.
[0044] This method could be to store advertising prediction accuracy reports in a trusted table.
[0045] This method could be one where the defined time period is one day.
[0046] This method could be used to build and store reports on the accuracy of advertising predictions over a related time period.
[0047] This method could be one where the associated time period is one day.
[0048] This method could be a way to create a prediction accuracy report by running a DAG job to extract predicted values and actual performance metrics for ad row items.
[0049] The method can be as follows: the prediction accuracy report includes a percentage error between the predicted number of search hits using a given alignment criterion and, for example, the actual number of search hits for the day, to measure the advertising prediction accuracy of the line item.
[0050] This method can be as follows: For the initial release, both the previous method and the current method are used to generate their respective predictions; however, the result of the current method may not be returned to the user at this stage, which we call shadow mode; the result from the current method is recorded along with the event and added to the prediction accuracy table (e.g., ad_line_item_daily_forecast_accuracy); for a period of time, each line item stores two sets of predictions: one for the previous method and one for the current method, so that direct comparisons can be made between the two predictions.
[0051] The method can be as follows: the previous method in shadow mode is invoked only when the row item is ready to be stored.
[0052] According to a sixth aspect of the invention, a system configured to perform any aspect of the fifth aspect of the invention is provided.
[0053] According to a seventh aspect of the present invention, a computer-implemented method for providing an advertising campaign is provided, the method comprising the following steps: (i) Platforms that programmatically create, schedule, and monitor workflows receive data by extracting and transforming it from an event database. The platform includes a directed acyclic graph (DAG) of an advertising inventory and an advertising prediction inventory. (ii) Evaluate and store advertising indices in the data and reporting engine, which include data ingested from the advertising forecast inventory DAG in the platform; (iii) The data streaming platform receives push prediction events from the advertising prediction service; (iv) Populate the event database from the data streaming platform; (v) The portal requests advertising predictions from the advertising prediction service; (vi) The advertising prediction service queries inventory by querying the advertising index stored in the data and reporting engine; (vii) The forecasting service uses the response to the inventory query to provide advertising forecasts to the portal; (viii) The portal processes received advertising forecasts to determine advertising campaigns; (ix) The portal will determine the advertising campaign and transmit it to the data layer; (x) The data layer transfers the determined advertising campaign to the web service interface; and (xi) The network service interface uses the transferred defined advertising campaigns to obtain campaign content from the platform, including obtaining campaign content from the advertising inventory, to serve the advertising campaigns.
[0054] The advantages include improved prediction accuracy: this is offset by the need for fewer row-based item displays in advertising campaigns to compensate for poorer prediction accuracy, which reduces the energy used in sending and receiving displays and the bandwidth required to send displays. The advantage is that the prediction service can receive responses to queries about inventory levels in real time, which is an improvement in prediction speed. The advantage is that (e.g., sponsored) portals can write advertising campaigns with predictions to (e.g., sponsored) data layers more quickly. The advantage is improved advertising campaigns.
[0055] The method can be any of the methods including the first, third, or fifth aspects of the present invention.
[0056] The method can be as follows: the platform for programmatically creating, scheduling and monitoring workflows is or includes Apache Airflow.
[0057] This method can be one in which the advertisement is used for one or more of the following: computing devices using iOS, computing devices using Android, desktop computing devices, and computing devices using mobile networks.
[0058] This method can be one of the methods used for advertising on smartphones, tablets, laptops, desktop computers, and smart TVs.
[0059] This method can be used where the data and reporting engine includes ETL and event databases.
[0060] This method can be one in which ETL includes an advertising inventory DAG and an advertising prediction accuracy DAG.
[0061] This method could be one in which ETL receives data by extracting and transforming it from an event database.
[0062] The method can be as follows: where the event database includes one or more of the following: ad response, pricing session initiation, ad display, ad view, ad click, and ad prediction data.
[0063] The method can be as follows: where the ad index includes one or more or all of the following data related to placement, tag, user ID, impressions {0,1}, views {0,1}, and clicks {0,1}.
[0064] This method can be as follows: the prediction service receives a response from the query inventory in less than one second. The advantage is that the prediction service receives the response to the query inventory in real time, which is an improvement in prediction speed; the advantage is that (e.g., sponsored) portals can write advertising campaigns with predictions to (e.g., sponsored) data layers more quickly. The advantage is improved advertising campaigns.
[0065] This method may be a method in which the network service interface is or includes the S3 network service interface.
[0066] This method could be one in which the prediction service pushes forecast events to an internal data streaming platform and an event database.
[0067] This method can be as follows: The predicted event is used to compare the prediction with measured advertising performance data to evaluate the prediction effect, and the measured prediction effect is used to modify future predictions to improve prediction accuracy. The advantage is improved prediction accuracy: this advantage allows for fewer line item displays in the advertising campaign to offset points of poor prediction accuracy, which reduces the energy used in sending and receiving displays and reduces the bandwidth required to send displays.
[0068] This method could be one in which the advertising prediction service is implemented in AWS or includes AWS methods.
[0069] This method could be implemented in Apache Druid, or it could include methods from Apache Druid.
[0070] The method can be as follows: the advertising inventory data includes data extracted from a table that includes events recorded by the delivery service when the delivery service receives a request from the front end; the location of the request in the delivery service and the request-related (e.g., flight) pricing service (FPS) session ID are extracted; the latter is then used to join with (e.g., flight) search pricing tables to extract search criteria parameters.
[0071] The method can be as follows: the advertising inventory data includes data extracted from a table, which includes searches (e.g., flights) performed on a specific webpage; the table contains (e.g., all) data related to search alignment criteria and suppliers checked for trips (e.g., flights); the table is used to extract alignment search criteria related to events in the table, including events recorded by the delivery service provider when the delivery service provider receives a request from the front end (e.g., search trip count), and to obtain (e.g., all) suppliers with trips (e.g., supplier search trips) for alignment criteria.
[0072] The method can be as follows: the advertising inventory data includes data extracted from tables containing advertising events (e.g., impressions, clicks, views), which are unified in a single entry and combined with search data related to requests made to the delivery service (e.g., via FPS tables).
[0073] This method could include a way to extract data from tables containing geolocation information, where the advertising inventory data includes data from the inventory.
[0074] This method can be as follows: the advertising inventory data is stored in a container of objects in Apache Parquet format, such as a prediction bucket or an S3 bucket.
[0075] This method can be as follows: the advertising inventory data stored in the object's container is consumed by the Apache Airflow Druid operator and then ingested in the Apache Druid cluster.
[0076] This method could be a way for a platform to report task failures using a reporting application in a DAG within a communication channel.
[0077] This method could be a way for the platform to use a reporting application to report any task failures in the DAG over a communication channel.
[0078] According to an eighth aspect of the invention, a system configured to perform any aspect of the seventh aspect of the invention is provided.
[0079] The various aspects of this invention can be combined. Attached Figure Description
[0080] Various aspects of the invention will now be described by way of example with reference to the following figures, wherein: Figure 1 An example of a predictive architecture is shown.
[0081] Figure 2 This shows an example of the extraction, transformation, and loading process.
[0082] Figure 3 This illustrates an example of the interaction between a prediction service, a sponsorship portal, and a database system that allows for fast querying of very large datasets, such as Apache Druid. Detailed Implementation
[0083] Our goals include: ● It possesses a reliable advertising inventory dataset based on historical searches, such as historical travel-related searches and historical flight searches.
[0084] ● Provides time-series estimates of the number of searches we may receive in the future, such as travel-related searches, like flight searches. Time-series estimates can be provided in real time, for example, in less than a second.
[0085] ● Build a model to predict the number of impressions or clicks an ad will receive on a specific webpage (such as a travel-related search page).
[0086] ● Provides an improved method for measuring the accuracy of advertising predictions over a period of time.
[0087] ● Provides a framework for generating new or existing predictive metrics based on historical data.
[0088] Product Objectives The product objective is to provide an advertising prediction service that can reliably and accurately assess advertising performance over a period of time. This service might be able to query historical search and delivery data that matches row-based item objectives to fit a model that predicts future sales and inventory and then generates performance estimates. Row-based items are items used to present data, such as on a webpage or in an application running on a computer or smartphone, where items are associated with revenue or expenses.
[0089] Early solutions In earlier solutions, the sponsorship portal used an audience reach dataset built using AudienceBuilder routines to compute predictions. The portal executed queries on the query engine and database using activity target parameters to obtain the following values: ● Supplier Audience: The total number of unique users a supplier reaches within a predetermined time period (e.g., the last 30 days).
[0090] ● Search Hits: The total number of searches performed on a given search target within a predefined time period, such as the last 30 days.
[0091] ●Target Audience: The total number of unique users reached within a predefined time period, such as the last 30 days, through a given search objective.
[0092] The following predictive metrics can then be calculated using previous values obtained from the query engine and database: The metric is scope: the estimated total number of people (unique users) who are potentially likely to see the ad.
[0093] The metric is impressions: the estimated number of times an ad is displayed.
[0094] The metric is clicks: the estimated number of times an ad is clicked. It's calculated using CTR (click-through rate) per target audience. CTR is the number of clicks divided by the number of impressions.
[0095] The metric is frequency: the estimated number of times each person sees the ad.
[0096] The metric is audience: it represents the percentage of people who are likely to see the activity out of the total number of searches, such as the total number of Skyscanner searches.
[0097] To generate audience engagement data, an Extract, Transform, Load (ETL) job runs on a logical analytics engine, such as Apache Spark, for large-scale data processing, like Apache Spark running on AWS. Logical analytics engines for large-scale data processing, such as Apache Spark, provide an interface for cluster programming with implicit data parallelism and fault tolerance. Two logical analytics engines for large-scale data processing, such as Apache Spark, might perform jobs as follows: ● Assignment 1: Enrich historical search data (e.g., historical travel search data, or historical flight search data (e.g., Flight Pricing Service (FPS)_sessions_started)) by incorporating predefined time periods (e.g., the last 30 days) into other sources; ● Assignment 2 reads the generated data from the object storage service through a network service interface (such as Amazon Simple Storage Service (S3)) and performs batch indexing on it in the query engine and database using a logic analysis engine for large-scale data processing (such as Apache Spark, query engine, and database driver).
[0098] What doesn't work very well? We have observed that early prediction services did not provide accurate performance metrics for travel-related search websites, such as flight bookings. For example, during the three-month data analysis period, only 28% of predicted clicks were within 50% of actual clicks. Furthermore, CTR predictions were also unsatisfactory. During the three-month data analysis period, 31% of predicted CTRs were within 20% of actual active CTRs.
[0099] Design Goals This design replaces our previous methods of evaluating ad performance, with the aim of: 1) Improve the accuracy of advertising prediction metrics for early-stage services by creating new services. New services: ●Based on historical searches, such as historical travel-related searches and historical flight searches, build an advertising inventory management system for the products displayed on the webpage.
[0100] ●Based on advertising inventory, time-series forecasting can be performed for future advertising campaigns, for example, by using specialized third-party tools.
[0101] ● Calculate estimated values for advertising performance metrics based on delivery forecasts.
[0102] 2) Measure and monitor the accuracy of advertising forecasts, for example, daily.
[0103] For ad performance estimation, this can be done such that when creating / editing row-based projects, performance estimation metrics are displayed via a prediction widget in the sponsor portal front end.
[0104] For the prediction service, this can be done as follows: given a search alignment criterion for a row item, the sponsoring portal backend calls the prediction service to obtain the value provided to the prediction widget; the activity optimizer routine can request the prediction service to estimate the budget consumption of the row item.
[0105] For the advertising flight search event dataset, the following can be done: Provide and maintain the advertising flight search event dataset for building predictive inventory.
[0106] Overview and Operational Theory Architecture Overview Below is an example of a predictive architecture covering ad inventory dataset generation and ad performance prediction. The focus is on online advertising products (e.g., iOS, Android, desktop, and mobile web). However, this solution can be extended to adapt to other advertising products, such as on the homepage, or for country-focused search and demo web content.
[0107] In this example of the prediction architecture, the sponsor portal requests predictions from the prediction service. The prediction service queries the ad index stored in the data and reporting engine, which also includes an ETL and an event database. The ad index may include one or more of the following: placement, tag, user ID, impressions {0,1}, views {0,1}, and clicks {0,1}. The ad index includes data obtained from the ad prediction inventory DAG in the ETL. The ETL also includes the ad inventory DAG and the ad prediction accuracy DAG. The ETL receives data by extracting and transforming from the event database, which includes one or more of the following: ad responses, pricing session initiation, ad impressions, ad views, ad clicks, and ad prediction data. The event database is populated by an internal data streaming platform. The internal data streaming platform receives pushed prediction events from the prediction service. Upon receiving the prediction, which may be received in real time (e.g., less than one second after its request), the sponsor portal writes the ad campaign with the prediction to the sponsor data layer. The sponsor data layer transfers the campaign to a web service interface (e.g., S3). Network service interfaces (e.g., S3) also obtain activities from ETL.
[0108] When the prediction service pushes prediction events to the internal data streaming platform and event database, these prediction events can be used to compare predictions with measured advertising performance data, evaluate prediction performance, and use the measured prediction performance to modify future predictions in order to improve prediction accuracy.
[0109] Figure 1 An example of a predictive architecture is shown.
[0110] Predictive architecture components Data and Reporting Engine These components are responsible for extracting, building, enriching, and indexing data, and then querying this data to calculate historical inventory levels.
[0111] Inventory Data Set Predicting inventory and sales data can be extracted from the following reliable examples: AdResponse: An event logged when the delivery service receives a request from the front end. From this dataset, we extract the location of the request in the delivery service and the associated (e.g., flight) pricing service (FPS) session ID. This final value is then used to join with (e.g., a flight) search pricing table to extract search criteria parameters.
[0112] (e.g., flights) PricingSessionStarted: Searches performed on specific web pages (e.g., flights). This trusted table contains (e.g., all) data related to search alignment criteria, as well as verified suppliers for the trip (e.g., flights). This table is used to extract alignment search criteria (e.g., search trip count) related to the response event and to obtain (e.g., all) suppliers with alignment criteria trips (e.g., supplier search trips).
[0113] Geographical data routine: Geolocation information. Uses may include, for example, storing route node IDs as location identifiers (origin and destination) in an FPS table. For instance, a sponsored portal uses sky codes and location identifiers. In the example, a mapping between location route node IDs, entity IDs, and sky codes needs to be performed for appropriate queries.
[0114] Advertising (e.g., flights) SearchEvents: Contains advertising events (e.g., impressions, clicks, views), unified in a single entry, and combined with search data (e.g., via an FPS table) that is related to requests for the delivery service. This table is used to extract performance values for the targeted advertising, such as trip and location (e.g., location-based trip performance).
[0115] The resulting dataset can be stored in containers of objects, such as prediction buckets or S3 buckets (e.g., 'S3: / / forecasting / '), in Apache Parquet format for consumption by Apache Airflow Druid operators and then ingested within the Apache Druid cluster. Apache Druid is a high-performance, real-time logical analytics database capable of providing sub-second responses to queries on streaming and batch data under both large-scale and low-load conditions. Apache Airflow is a community-created platform for programmatically creating, scheduling, and monitoring workflows.
[0116] Apache Parquet is an open-source, column-oriented data file format designed for efficient data storage and retrieval. It provides efficient data compression and encoding schemes, enhancing the performance of batch processing complex data. Apache Parquet aims to become a universal exchange format for both batch and interactive workloads.
[0117] The following table shows an example dataset.
[0118]
[0119] In the example, for this dataset, there are typically 6 to 8 million rows per predefined time period, such as each day. In the example, the dataset has at least one million rows per predefined time period, such as each day. In the example, the dataset has at least 100,000 rows per predefined time period, such as each day. Previous inventory management datasets stored in the query engine and database averaged approximately 24 million rows per day.
[0120] Extract, transform, load (ETL) In the example, at least three DAGs (e.g., daily) run on a platform, for example in Apache Airflow, programmatically creating, scheduling, and monitoring workflows to extract, for example, inventory data from the previous day and ingest it into a real-time logical analytics database, which, for example in Apache Druid, delivers real-time (e.g., less than one second) responses to queries for streaming and batch data under large-scale and high-load conditions. 1) Advertising Forecasting Inventory Directed Acyclic Graph (DAG): Runs jobs to extract data from pricing services, such as FPS, tables, and delivery service events, to construct daily forecasted inventory. It can consist of at least two operators: ● Sensors waiting for the successful completion of the advertising (such as flight) search event builder job in the advertising report DAG.
[0121] ● A logical analysis engine for large-scale data processing (such as Apache Spark) application jobs, building predictive inventory datasets and storing them in containers of objects (such as predictive (S3) buckets) in Apache Parquet format.
[0122] 2) Predictive Inventory Management (DAG) for Advertising (e.g., Apache Druid): A daily predictive inventory management dataset is built by running jobs to extract data from different events. This dataset is then ingested into an index of a real-time logical analytics database that, under large-scale and heavy load conditions, such as in Apache Druid, provides real-time (e.g., less than a second) responses to queries about streaming and batch data for later use. In this example, it consists of at least two operators: ● Sensors awaiting the successful completion of the predicted inventory builder job (e.g., from a previous DAG); ● Operators of real-time logic analytics databases that provide real-time (e.g., less than a second) responses to streaming and batch data queries, such as Apache Druid, which stores inventory datasets in containers of objects (e.g., forecast (e.g., S3) buckets) in a format (e.g., Apache Parquet) and feeds them into a real-time logic analytics database cluster that provides real-time (e.g., less than a second) responses to streaming and batch data queries.
[0123] 3) Advertising Prediction Accuracy Report DAG: Runs a job to extract predicted and actual performance metrics for advertising row items, thereby creating a prediction accuracy report. In the example, it consists of at least three operators: ● A logic analytics engine (such as Apache Spark) application job for large-scale data processing builds daily reports of actual search clicks based on alignment criteria for campaign ad row items. To support the previous system, a Python operator clicks a sponsored data layer endpoint (e.g., / export-audiencereach-changes-S3), which derives predictions for all ad row items. For the new system, the prediction service (Prophet) sends events with predicted time series, which can then be obtained in a job using a trusted table (e.g., Apache Spark). ● Sensors waiting for the Daily Ad Row Project Performance Report Builder in the Ad Report DAG to complete successfully; ● Logical analytics engine (e.g., Apache Spark) application jobs for large-scale data processing. These jobs take the data extracted in the previous steps, the search hits of ad line item alignment criteria, the predicted values of active ad line items, and their actual performance to build an accuracy report and store it in a trusted table (e.g., ad_line_item_search_hits).
[0124] In the example of the extraction, transformation, and loading process, the Ads Reporting DAG includes an Ads (e.g., Flights) Search Event Builder and an Ads Daily Item Performance Builder. The Ads Inventory Management DAG includes an Ads (e.g., Flights) Search Event Sensor that waits for the Ads (e.g., Flights) Search Event Builder job in the Ads Reporting DAG to complete successfully. Then, the Ads Predictive Inventory Management Builder in the Ads Inventory Management DAG constructs a predictive inventory management dataset and stores it in a container of objects, such as a predictive (e.g., S3) bucket, in Apache Parquet format, for example. The advertising prediction inventory sensor in the Druid DAG waits for the advertising prediction inventory builder in the advertising prediction inventory DAG to complete. Then, in the advertising prediction inventory DAG, the real-time logical analysis database operator, for example under large-scale and high-load conditions such as Apache Druid, retrieves the inventory dataset stored in object (such as prediction (such as S3) bucket) containers in a format (such as Apache Parquet) and absorbs it into the real-time logical analysis database cluster, including indexes. This cluster provides real-time (such as less than one second) responses to streaming and batch data queries, such as under large-scale and high-load conditions, such as Apache Druid clusters.
[0125] In this example, the sensors in the Ad Prediction Accuracy DAG wait for the daily ad row item performance builder in the Ad Reporting DAG to complete. Then, given the alignment criteria for the campaign ad row items, a daily report of actual search hits is built in the Ad Prediction Accuracy DAG. This report, along with the ad row item prediction builder in the Ad Prediction Accuracy DAG, is used to create a prediction accuracy report within the Ad Prediction Accuracy DAG. The ad row item prediction builder is constructed by deriving the ad row item prediction history into the sponsored data layer, which transfers row item predictions to the ad row item prediction builder via a web service interface (e.g., S3).
[0126] Figure 2 This shows an example of the extraction, transformation, and loading process.
[0127] OLAP (Online Analytical Processing) Indexes from different target fields in predictive inventory and location are created from datasets ingested from a real-time logical analytics database cluster that provides real-time (e.g., less than a second) responses to streaming and batch data queries, such as under large-scale and high-load conditions like Apache Druid, thus covering (e.g., all) products. Data specifications can be defined in operators within a DAG (e.g., Apache Airflow) or similar framework (e.g., Apache Druid).
[0128] Specification Definition Example: "spec": { "dataSchema": { "dataSource": "Advertising_forecasting_inventory", "timestampSpec": { "column": "dt", "format": "auto", "missingValue": null }, "dimensionsSpec": { "dimensions": [{ "type": "string", "name": "user_id", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": true }, { "type": "string", "name": "placement_id", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": true }, { "type": "string", "name": "market", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": true }, { "type": "string", "name": "locale", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": true }, { "type": "string", "name": "origin_country_code", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": true }, { "type": "string", "name": "origin_city", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": true }, { "type": "string", "name": "origin_airport", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": true }, { "type": "string", "name": "destination_country_code", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": true }, { "type": "string", "name": "destination_city", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": true }, { "type": "string", "name": "destination_airport", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": true }, { "type": "string", "name": "cabin_class", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": true }, { "type": "string", "name": "trip_type", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": true }, { "type": "string", "name": "route_type", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": true }, { "type": "string", "name": "departure_dt", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": true }, { "type": "long", "name": "booking_horizon", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": false }, { "type": "long", "name": "adult_passengers", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": false }, { "type": "long", "name": "child_passengers", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": false }, { "type": "long", "name": "infant_passengers", "multiValueHandling": "SORTED_ARRAY", "createBitmapIndex": false }, { "type": "string", "name": "supplier_ids", "multiValueHandling": "SORTED_SET", "createBitmapIndex": true } ], "dimensionExclusions": [ "dt", "__time", "count", "click_count", "view_count", "impression_count" ] }, "metricsSpec": [{ "type": "count", "name": "count" }, { "type": "longSum", "name": "click_count", "fieldName": "click_count", "expression": null }, { "type": "longSum", "name": "impression_count", "fieldName": "impression_count", "expression": null }, { "type": "longSum", "name": "view_count", "fieldName": "view_count", "expression": null } ], "granularitySpec": { "type": "uniform", "segmentGranularity": "DAY", "queryGranularity": "DAY", "rollup": true, }, } } The canonical data patterns ingested in a real-time logic analytics database cluster are similar to those of a predictive inventory dataset, but the following details may differ: Real-time logic analytics database clusters can provide real-time (e.g., less than a second) answers to streaming and batch data queries, such as under large-scale and high-load conditions, like Apache Druid. ● Exclude fps_session_id and perform aggregation. Real-time logical analytics database clusters (such as Apache Druid) can aggregate raw data during ingestion using a process we call "aggregation." Rollup is a first-level aggregation operation performed on a selected set of columns, reducing the size of stored data; ● destination_city, origin_city, destination_airport, and origin_airport are represented by the location entity ID (string).
[0129] In previous solutions, a query engine and database were used to store and query inventory data. The new solution replaces this with a real-time logical analytics database that can respond to queries about streaming and batch data in real time (e.g., less than a second), such as under large-scale and high-load conditions, like Apache Druid, to verify that we can execute the same queries.
[0130] Predictive services (which we can call "Prophet") In the example, the prediction service (which we can call "Prophet") is implemented as a Lambda function. This is to gain the benefits of serverless service and the benefit of not having to maintain the service. AWS Lambda is a serverless, event-driven compute service that allows code to run for various types of applications or backend services without providing or managing servers. Lambda can be triggered from over 200 AWS services and Software as a Service (SaaS) applications.
[0131] A key objective of the prediction service is to provide a prediction of the clicks and impressions a row-based item will receive, given its search alignment criteria. This is an alternative to the Audience Touch service, which provides a historical search dataset, such as searches on skyscanner.net. The new service aims to create predictions based on more accurate data and eliminate some assumptions in the dataset's query logic.
[0132] In the example, the new service works by obtaining daily time series of historical searches, impressions, and clicks, using this to obtain a more accurate CTR, and creating an inventory projection by training and querying a machine learning (ML) model in real time. Since search, also known as inventory, represents the upper limit of possible clicks and impressions (without duplicates), it is the starting point for the prediction algorithm. If our inventory estimate is correct, we are more likely to make accurate predictions.
[0133] The logic implemented by this infrastructure component can employ row-based project search alignment criteria and generate predictive metrics. This may involve the following steps: ● Receives the search alignment criteria for row-format items as input; ● Convert different target dimensions and positioning into index query syntax; ● Request historical information from the indexing infrastructure. This typically involves querying based on the target and then obtaining a predicted inventory level, as inventory represents the maximum number of impressions and clicks a row-based item can receive; ● Use click-through rate (CTR) or CTR calculation to get the final expected count of predicted clicks and impressions returned in the response. CTR is the expected ratio of clicks to impressions and is estimated based on time-series responses (such as Apache Druid). ● Send events with predicted time series so that they can be used for accuracy reporting.
[0134] In an example of the prediction service's operation, the prediction service request processor receives a request for a prediction from the sponsor portal. This prediction is used to forecast the number of clicks and impressions a given row item will receive, given search alignment and positioning data. The request processor then translates the alignment and positioning data into indexed query syntax. The request processor then uses the indexed query syntax to request an inventory from a service that requests a database system, such as Apache Druid, that allows fast querying of very large datasets. The service then receives (e.g., daily) time-series historical searches from the database system, which also allows fast querying of very large datasets, such as from Apache Druid. The service then returns the requested inventory to the request processor. The request processor then uses the returned inventory in a request to a real-time service that predicts the number of clicks and impressions a row item will receive, where the real-time service provides the prediction in real time, for example, in less than a second. An example of such a real-time service is Facebook's Prohet. The request processor receives the requested prediction from the real-time service. The request processor then uses the received predictions of the number of clicks and impressions that the row item will receive to derive the row item's CTR. The request processor then sends the predictions of the number of clicks and impressions that a row item will receive, along with the CTR, back to the sponsoring portal.
[0135] Figure 3 This illustrates an example of the interaction between a prediction service, a sponsor portal, and a database system that allows for fast querying of very large datasets, such as Apache Druid.
[0136] Application Programming Interface (API) Prediction services (such as Prophet) can have a (e.g., a single) API endpoint responsible for generating predictions. This endpoint can be designed to be invoked from a sponsorship portal to obtain predictions for row-based projects.
[0137] Example method: POST Endpoint example: / api / forecast Request example: Example request load { "id": "CAM-f799ffcf-cdfa-424c-b784-083148bbdca9", "placementIds": ["desktop.flights.dayview / inline", "mobile.flights.dayview / inline"], "delivery": { "mode": "STANDARD", "startDate": "2023-08-18T19:30:00.000Z", "endDate": "2023-08-18T23:59:59.999Z", }, "budget": {"amount": 12700, "type": "CPC", "duration": "LIFETIME", "cost": 2}, "targeting": { "markets": ["US"], "locales": [], "searchTypes": ["round", "one_way", "multi_city"], "supplier": "uair", "routeType": "custom", "routes": [ {"origin": [{"city": "27546320"}], "destination": [{"city": "27537542"}, {"city": "33835676"}]} ], "excludedLocations": {"origin": [{"city": "1234"}], "destination": []} , "cabinClasses": ["Economy", "PremiumEconomy", "Business", "First"], "minDays": 1, "maxDays": 14, }, } Example Response: Phase 1 - MVP Example Response Payload { "id": "43bb4355-c96c-4514-adcf-c690dec3919f", "forecast": { "impressions": { "total": { "lower": 1200, "upper": 1800, "predicted": 1550 } }, "clicks": { "total": { "lower": 80, "upper": 110, "predicted": 92 } } } Phase 2 - Range Example Response Payload { "id": "43bb4355-c96c-4514-adcf-c690dec3919f", "forecast": { "impressions": { "total": { "lower": 1200, "upper": 1800, "predicted": 1550 } }, "clicks": { "total": { "lower": 80, "upper": 110, "predicted": 92 } }, "scope": { "total": { "lower": 5400, "upper": 8700, "predicted": 7899 } }, "frequency": { "total": { "lower": 3.3, "upper": 6.2, "predicted": 4.5 } }, } Sponsor Portal In previous approaches, the sponsored portal queries the audience reach dataset from the query engine and database via a reach service (e.g., in the backend) to obtain search hits, vendor audiences, and target audiences for a given search alignment. Each time a row item is created or updated, the results from the reach service are sent to the sponsored data layer for storage. The values obtained from the reach service are used to calculate ad performance estimates and are displayed in a prediction widget (e.g., in the frontend).
[0138] In website architecture, the terms front-end (or sometimes simply front-end or back-end) and back-end (or sometimes simply back-end or back-end) refer to the separation of concerns between the front-end (e.g., the presentation layer) and the back-end (e.g., the data access layer). The terms front-end and back-end can be used in software. The terms front-end and back-end can also be used in hardware.
[0139] For the new architecture, assuming a different logic is needed to generate performance and supply estimates, this logic can be isolated in its own service, namely the prediction service. The sponsored portal can invoke the prediction service to obtain advertising performance estimates.
[0140] Languages and frameworks Prediction services can be built using Python, creating service structures that generate project templates using the cookiecutter command-line tool. Cookiecutter is a cross-platform command-line utility that creates projects from cookiecutter (project templates), such as Python package projects and C projects.
[0141] Prophet Prophet is a Facebook library that provides an interface to a machine learning model that can be trained and used to predict dynamic displays. Given a search alignment as input, this library can be used from the prediction service to generate time series forecasts. The model can take historical inventory data and produce time series forecasts as its output.
[0142] Druid Apache Druid is a database system that allows for fast querying of very large datasets. Unlike tools like Databricks, it allows for running analytical queries that require fast results. In previous approaches, we used query engines and database clusters to store forecasted inventory, and in the examples of this new approach, Apache Druid can be used for the following reason: we can use a managed Apache Druid cluster within a web service account (such as an AWS account).
[0143] The forecasting service can use PyDruid as a client library to integrate with Apache Druid, ensuring that queries to Druid are formatted correctly. PyDruid allows Python users to query Druid in a way that makes sense to them—and derive the results into a useful format. In this step, the row-formatted project target object can be transformed into a series of filter objects, which are then used to query Druid for inventory time series data.
[0144] Security and privacy This service must meet security and privacy standards: ●Data protection and privacy ●Privacy impact assessment, data recording and inventory management ●Data subject rights and tracking technology ●New tracking technology process Personal Data Protection A unique user ID can be used to identify and track anonymous and verified users on the Skyscanner platform. To calculate how many different users a row project can reach, this value can be included in the predicted inventory dataset. The user ID is considered private data because it can be used to identify travelers; therefore, during data transformation, the value is hashed using the SHA-2 function and stored in this manner. SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the U.S. National Security Agency (NSA) and first released in 2001.
[0145] Monitoring, metrics, logging, and alerts Metrics or logs may be included in each step of the process.
[0146] Data and reporting engine: ETL alarm: Reporting applications (e.g., DAGger applications) can be set up in communication channels (e.g., relaxation channels), and platforms for programmatically creating, scheduling, and monitoring workflows (e.g., Apache Airflow) can be configured to use the reporting applications to report any task failures in the DAG within the communication channel (e.g., relaxation channel). Slack organizes conversations in the workspace into dedicated spaces called channels. Slack is provided by Slack Technologies, LLC.
[0147] log: Application logs set in the code using the logger module can be displayed in the platform's task log section to programmatically create, schedule, and monitor workflows (e.g., Apache Airflow) DAG UI. For example, all logical analytics engines used for large-scale data processing, such as Apache Spark, can have their application logs set in the code using (e.g., Python) logger modules displayed in the platform's task log section to programmatically create, schedule, and monitor workflows (e.g., Apache Airflow) DAG UI. These operation logs can be used to record intermediate data frames and output counts for the final data frames written. Furthermore, logs about operation information can be displayed in the DAG UI regarding the platform's programmatic creation, scheduling, and monitoring of workflow execution (e.g., Apache Airflow).
[0148] measure: In accuracy reports (such as daily reports) generated by large-scale data processing logic analysis engines (such as Apache Spark, app), we use the percentage error between predicted and actual values (such as the current day) to measure the accuracy of ad predictions for line-based projects. This can be done for inventory, impressions, and CTR. The percentage error value is calculated using the following expression: (Actual - Predicted / Actual). 100. These values can be stored in tables used in ETL. After generating the tables, we can read the data, perform some aggregations, and then build metrics to push to the internal data streaming platform.
[0149] In dashboards of analytics and interactive visualization web applications (such as Grafana, a multi-platform open-source analytics and interactive visualization web application that provides charts, graphs, and alerts for the web when connected to supported data sources), we group these percentage errors in different ways, allowing us to examine overall accuracy through various metrics. By understanding the error distribution and mean absolute percentage error across all lines, we can understand how well the prediction algorithm is performing.
[0150] Prediction services alarm: Operational alerts can be generated for instance resource consumption (e.g., memory usage, CPU usage), i.e., the resource consumption for running the prediction service. When resource consumption reaches a predetermined threshold, a warning message can be displayed in a communication alert channel (e.g., a relaxation alert channel). Operational graphs can be displayed (e.g., using New Relic software. New Relic is a network tracing and analytics company. Its cloud-based software allows websites and mobile applications to track user interactions and service operator software and hardware performance to monitor memory and CPU usage). Additional resources can be automatically requested in response to when an operational alert regarding resource consumption for running the prediction service is generated. Additional resources can be automatically provided in response to when an operational alert regarding resource consumption for running the prediction service is generated.
[0151] An alert can be issued when the prediction service returns an error response (such as 5XX or 4XX).
[0152] A warning can be given when the service is unavailable.
[0153] log: In the example, logs programmatically set by the logger module are pushed, so they can be historically queried through the user interface (UI). In the example, logs programmatically set by Python's logger module are pushed to New Relic, so they can be historically queried through the New Relic UI.
[0154] After a prediction model (such as Prophet) is trained, it can be tested. In the example, intermediate predictions are made for past dates, and the predicted output is compared to actual historical data for those past dates. The percentage error in the test is evaluated to assess how it changes over time; when the error is too high, an alert can be sent: sending logs is one way to do this.
[0155] Designed to support real-time databases for analytics applications, such as Apache Druid (e.g., imply.io). (Hidden Data is a software company. It develops and provides commercial support for the open-source Apache Druid, a real-time database designed to support analytics applications.) Alerts: Appropriate alerts will be set up on the real-time database to support analytics applications (e.g., implying Druid) ingesting these events and reporting them. In the example, these alerts are set up using Clarity, Imply's system for monitoring managed Druid clusters. Example alerts include: Unresolvable event: Some records cannot be parsed and retrieved, for example in Druid.
[0156] Discarded events: Some records are discarded, for example, by Druid.
[0157] Fewer than 1000 records processed: Fewer than 1000 records received.
[0158] Logs: In the Apache Druid (e.g., impy.io) UI, users or processes can inspect the operation logs of ingestion tasks.
[0159] Metrics: A real-time database designed for analytics applications (such as SQLAlchemy Druid) to receive inventory data. It calls APIs (such as Druid) to retrieve the results of the receiving tasks, including the number of rows received, which can be pushed as metrics. A chart can be created for monitoring purposes. SQLAlchemy SQL Toolkit and Object-Relational Mapper are a comprehensive suite of tools for working with databases and Python. SQL is a Structured Query Language.
[0160] deploy In deployment, this new architecture is deployed while retaining the previous forecasting solution. This means that for the initial version, whenever a user makes a forecast, both the old and new systems will be invoked, but the results from the new system may not be relayed back to the user at this stage. We call this shadow mode. The results from the new system are logged along with the event and added to the forecast accuracy table (e.g., advertising_line_item_daily_forecast_accuracy). For some time, each line item has had two sets of metrics: one for the old forecast and one for the new forecast. This allows us to directly compare the two forecasts. It also gives us time to improve the accuracy of the new system as needed before switching (e.g., completely) from the old solution to the new solution. Once the model accuracy is determined and the new forecasting service is stable, we can switch from the old system to the new system (e.g., for flight products).
[0161] Request strategy In the old prediction system, this typically runs with response times of 5 to 15 seconds. On the other hand, the response time for calculating the estimated total number of potential ad viewers (unique users) is usually less than a second. Therefore, simultaneously calling the old prediction system and calculating the estimated total number of potential ad viewers (unique users) is not straightforward. Ideally, the old prediction system in shadow mode should only be called when row-based items are ready to be stored. In the example, the calculation of the estimated total number of potential ad viewers (unique users) is called every time a user edits a row. This is not suitable for the old prediction system because the cost of a high-performance, real-time logical analytics database also needs to be considered, which frequently provides sub-second responses to streaming and batch data queries under large-scale and low-load conditions (such as Apache Druid clusters). Furthermore, we do not want to make new requests to the old prediction system before the answer to a previously requested one has been returned. One option is to change the UI to include a call-to-action (CTA) button that can be selected to intentionally generate a prediction. This, of course, has a downside: it requires user action to obtain an updated prediction.
[0162] Measurement accuracy In the example, the prediction accuracy results varied significantly, indicating that the model requires further tuning. Examining metrics for the old and new systems side-by-side (e.g., in Grafana) not only provides a good understanding of absolute accuracy but also the performance of the new system compared to the old one (which may have been highly unstable). No single "accuracy" metric can be applied to the entire system and represents the sum of its parts. We typically examine the accuracy, inventory projection, and impression and click-through rate predictions of prediction services (such as Prophet) to determine which aspects of the service need adjustment, thereby implementing the goal of creating a more reliable advertising prediction service.
[0163] Although this disclosure emphasizes applications involving flights, other applications, such as hotel or car rental, may also be provided.
[0164] Facebook Prophet Facebook Prophet is a process for forecasting time series data based on an additive model, where the non-linear trend is adapted to yearly, weekly, and daily seasonality, as well as holiday effects. It is best suited for time series with strong seasonal effects and historical data spanning several seasons. Prophet is very robust to missing data and trend changes, and typically handles outliers well.
[0165] Amazon S3 or Amazon Simple Storage Service Amazon S3, or Amazon Simple Storage Service, is a service provided by Amazon Web Services (AWS) that offers object storage through the Web Services interface. Amazon S3 uses the same scalable storage infrastructure as Amazon.com to run its e-commerce network. Amazon S3 can store any type of object, allowing for uses such as internet application storage, backup, disaster recovery, data archiving, data lakes for analytics, and hybrid cloud storage. AWS launched Amazon S3 in the United States on March 14, 2006, followed by a launch in Europe in November 2007.
[0166] Notice It should be understood that the arrangements cited above are merely illustrative of the application of the principles of the present invention. Many modifications and alternative arrangements can be designed without departing from the spirit and scope of the invention. Although the invention has been specifically and thoroughly described above in conjunction with the accompanying drawings and in conjunction with examples which are presently considered to be the most practical and preferred embodiments of the invention, it will be apparent to those skilled in the art that various modifications can be made without departing from the principles and concepts of the invention set forth herein.
Claims
1. A computer-implemented method for predicting the number of clicks and impressions a row-based item will receive within a defined time period, the method comprising the following steps: (i) A prediction service that receives a request from a portal that predicts the number of clicks and the number of impressions the row item will receive within a defined time period, the request including alignment criteria and location data. (ii) The prediction service translates the alignment criteria and the positioning data into index query syntax; (iii) The prediction service uses the index query syntax to request inventory management via service request; (iv) The service sends a request to the database system, the request corresponding to the index query syntax of the time series of historical searches; (v) In response to the request, the service receives the time series of the historical search from the database system; (vi) The service provides inventory management services to the prediction service corresponding to the defined time period based on the time series of the received historical searches; (vii) The prediction service uses the returned inventory data in a request to the real-time service, the real-time service predicting the number of clicks and the number of impressions that the row item will receive within a defined time period, wherein the prediction service receives a response from the real-time service in real time, for example, in less than a second. (viii) The prediction service uses the received predictions of the number of clicks and the number of impressions that the row item will receive within the defined time period to derive the click-through rate (CTR) of the row item. (ix) The prediction service sends back to the portal a prediction of the number of clicks and the number of impressions that the line item will receive within the defined time period, and the click-through rate (CTR) within the defined time period.
2. The method of claim 1, wherein the historical search relates to flights, hotel bookings, or car rentals.
3. The method according to any one of the preceding claims, wherein the time series of the historical search is a time series of historical searches at predetermined intervals.
4. The method of claim 3, wherein the time series of the historical search is a daily time series of the historical search.
5. The method according to any one of the preceding claims, wherein the prediction service is implemented as or includes AWS Lambda functionality.
6. The method according to any one of the preceding claims, comprising creating an inventory projection by training and querying a machine learning (ML) model in real time.
7. The method according to any one of the preceding claims, comprising receiving the search alignment criteria of the row item as input.
8. The method according to any one of the preceding claims, wherein the prediction of the number of clicks and the number of impressions that the row item will receive within the defined time period includes a predicted time series.
9. The method of claim 8, wherein the prediction service sends events comprising a time series of the predictions, whereby the events can be used for accuracy reporting.
10. The method according to any one of the preceding claims, wherein the service makes a request to a database system that allows for fast querying of very large datasets, such as the Apache Druid database system.
11. The method according to any one of the preceding claims, wherein the database system queries at least 100,000 rows of data, or at least 1 million rows of data, or at least 10 million rows of data, or at least 100 million rows of data.
12. The method according to any one of the preceding claims, wherein the real-time service predicts the number of clicks and the number of impressions that the row item will receive within a defined time period, and the real-time service is or includes Facebook's Prophet.
13. The method of claim 12, wherein the Facebook prophet provides an interface to a machine learning model, which can be trained and, when trained, can predict displays in less than one second.
14. The method according to any one of the preceding claims, wherein the line item is an advertisement or news item.
15. The method according to any one of the preceding claims, wherein resource consumption is monitored, and if resource consumption reaches a predetermined threshold, an operational alert (e.g., a warning message) is generated.
16. The method of claim 15, wherein additional resources are automatically requested in response to when the operational alert is generated.
17. The method of claim 15 or 16, wherein additional resources are automatically provided in response to when the operational alert is generated.
18. The method according to any one of claims 15-17, wherein the resource consumption includes memory usage or central processing unit (CPU) usage.
19. The method according to any one of the preceding claims, wherein the operational alert (e.g., a warning message) is generated when the prediction service returns an error response.
20. The method according to any one of the preceding claims, wherein the operational alert (e.g., a warning message) is generated when the prediction service is unavailable.
21. A system configured to perform the method of any one of claims 1-20.
22. A computer-implemented method for constructing and ingesting an advertising prediction inventory dataset in a real-time logic analysis database cluster, the method comprising the following steps: (i) An advertising (e.g., flight) search event builder performs a builder job in a directed acyclic graph (DAG) of advertising reports to build an advertising (e.g., flight) search event dataset until the builder job is complete; (ii) The advertising (e.g., flight) search event sensor in the advertising inventory DAG awaits the completion of the builder job; (iii) The advertising prediction inventory builder in the advertising inventory DAG constructs the prediction inventory dataset and stores the prediction inventory dataset in a format in a container (e.g., prediction bucket) for objects. (iv) The advertising prediction inventory sensor in the advertising prediction inventory DAG waits for the advertising prediction inventory builder in the advertising inventory DAG to complete its work. (v) In the advertising prediction inventory DAG, the real-time logic analysis database passes real-time (e.g., less than one second) responses to queries on streaming and batch data (e.g., Apache Druid), and the operator of the real-time logic analysis database retrieves the inventory dataset stored in a container (e.g., prediction bucket) for objects and then ingests the inventory dataset stored in a format (e.g., Apache Parquet), including it in the real-time logic analysis database, such as the Apache Druid cluster, indexed.
23. The method according to claim 22, wherein the method comprises the method according to any one of claims 1-20.
24. The method of claim 22 or 23, wherein the advertising search event builder is or includes an advertising flight search event builder, an advertising hotel search event builder, or an advertising car rental search event builder.
25. The method according to any one of claims 22-24, wherein the real-time logic analysis database cluster queries at least 100,000 rows of data, or at least 1 million rows of data, or at least 10 million rows of data, or at least 100 million rows of data.
26. The method according to any one of claims 22-25, wherein the advertising prediction inventory builder is a logical analysis engine for large-scale data processing (e.g., Apache Spark) application jobs.
27. The method according to any one of claims 22-26, wherein the container for the object is a prediction bucket.
28. The method of claim 27, wherein the prediction bucket is an S3 bucket.
29. The method of any one of claims 22-28, wherein the advertising predictive inventory (e.g., Apache Druid) DAG: runs the job to extract data from different events to construct the daily predicted inventory dataset to be ingested in an index of a real-time logical analysis database, which, for example under large-scale and heavy load, such as in Apache Druid, delivers real-time (e.g., less than one second) responses to queries on the streaming data and the batch data for subsequent queries.
30. The method according to any one of claims 22-29, wherein the inventory dataset is used for products presented on a webpage.
31. The method according to any one of claims 22-30, wherein the advertising (e.g., flight) search event dataset is extracted from a pricing service (e.g., FPS) table.
32. A system configured to perform the method of any one of claims 22-31.
33. A computer-implemented method for constructing and storing an advertising prediction accuracy report, the method comprising the steps of: (i) Output the historical forecast of advertising line projects to the data layer; (ii) Transfer the row item predictions from the data layer to the ad row item prediction builder in the directed acyclic graph (DAG) of ad prediction accuracy via the web service interface, and construct the ad row item predictions. (iii) A logical analytics engine (e.g., Apache Spark) used for large-scale data processing in the DAG of advertising reports receives targeting criteria for campaign advertising line items; (iv) In view of the number of search hits of the alignment criteria of the campaign ad campaign, corresponding to a defined time period, the logical analysis engine (e.g., Apache Spark) for large-scale data processing in the ad report DAG performs a job to build a report on the performance of the ad campaign. (v) The sensors in the advertising prediction accuracy DAG wait for the advertising row project performance report builder in the advertising report DAG to complete the business; (vi) A logic analysis engine (e.g., Apache Spark) for large-scale data processing uses reports on the performance of the ad row items, the predictions of the ad row items, and the number of search hits based on the alignment criteria of the active ad row items to perform jobs in the ad prediction accuracy DAG to build and store the ad prediction accuracy report.
34. The method according to claim 33, wherein the method comprises any one of claims 1-20 or any one of claims 22-31.
35. The method according to claim 33 or 34, wherein the network service interface is or includes an S3 interface.
36. The method according to any one of claims 33-35, wherein the advertising prediction accuracy report is stored in a trusted table.
37. The method according to any one of claims 33-36, wherein the defined time period is one day.
38. The method according to any one of claims 33-37, wherein the advertising prediction accuracy report is constructed and stored within the relevant time period.
39. The method of claim 38, wherein the associated time period is one day.
40. The method according to any one of claims 33-39, wherein the advertising prediction accuracy DAG runs the job to extract the predicted values and actual performance metrics of the advertising line items to create the prediction accuracy report.
41. The method of any one of claims 33-40, wherein the prediction accuracy report includes using a percentage error to measure the advertising prediction accuracy of the line item, the percentage error being the error between the predicted value of the number of search hits of the alignment standard and, for example, the actual value of the number of search hits on a given day.
42. The method according to any one of claims 33-41, wherein for an initial release, both the previous method and the current method are used to generate their respective predictions, however, the result of the current method may not be returned to the user at this stage, which we refer to as shadow mode; the result from the current method is recorded along with the event and added to a prediction accuracy table (e.g., ad_line_item_daily_forecast_accuracy); for a period of time, each line item stores two sets of predictions: one for the previous method and one for the current method, so that direct comparisons can be made between the two predictions.
43. The method of claim 42, wherein the previous method in shadow mode is invoked only when the row item is ready to be stored.
44. A system configured to perform the method of any one of claims 33-43.
45. A computer-implemented method for serving an advertising campaign, the method comprising the following steps: (i) A platform for programmatically creating, scheduling and monitoring workflows receives data by extracting and transforming it from an event database, the platform including a directed acyclic graph (DAG) of an advertising inventory and an advertising prediction inventory; (ii) Evaluate and store advertising indices in the data and reporting engine, the advertising indices including data ingested from the advertising prediction inventory DAG in the platform; (iii) The data streaming platform receives push prediction events from the advertising prediction service; (iv) Populate the event database from the data streaming platform; (v) The portal requests an advertising prediction from the advertising prediction service; (vi) The advertising prediction service queries the inventory management system by querying the advertising index stored in the data and reporting engine; (vii) The prediction service uses the response to the query of inventory to provide advertising predictions to the portal; (viii) The portal processes the received advertising predictions to determine advertising campaigns; (ix) The portal will transmit the determined advertising campaign to the data layer; (x) The data layer transfers the determined advertising campaign to the network service interface; and (xi) The network service interface uses the transferred determined advertising campaign to obtain campaign content from the platform, including obtaining campaign content from the advertising inventory to serve the advertising campaign.
46. The method according to claim 45, wherein the method comprises any one of claims 1-20, or any one of claims 22-31, or any one of claims 33-43.
47. The method of claim 45 or 46, wherein the platform for programmatically creating, scheduling, and monitoring workflows is or includes Apache Airflow.
48. The method according to any one of claims 45-47, wherein the advertisement is used on one or more computing devices using iOS, computing devices using Android, desktop computing devices, and computing devices using mobile networks.
49. The method according to any one of claims 45-48, wherein the advertisement is used on one or more of a smartphone, tablet, laptop, desktop computer, and smart TV.
50. The method according to any one of claims 45-49, wherein the data and reporting engine comprises ETL and an event database.
51. The method of claim 50, wherein the ETL comprises an advertising inventory DAG and an advertising prediction accuracy DAG.
52. The method of claim 50 or 51, wherein the ETL receives data by extracting and transforming it from the event database.
53. The method according to any one of claims 45-52, wherein the event database comprises one or more of advertising response, pricing session initiation, advertising display, advertising view, advertising click, and advertising prediction data.
54. The method according to any one of claims 45-53, wherein the advertising index includes one or more or all of the following data associated with: placement, tagging, user ID, display {0,1}, view {0,1}, and click {0,1}.
55. The method according to any one of claims 45-54, wherein the prediction service receives the response from the query inventory in less than one second.
56. The method according to any one of claims 45-55, wherein the network service interface is or includes an S3 network service interface.
57. The method according to any one of claims 45-56, wherein the prediction service pushes the forecast event to the internal data stream platform and the event database.
58. The method of any one of claims 45-57, wherein the predictive event is used to compare the prediction with measured advertising performance data to evaluate the predictive performance, and to use the measured predictive performance to modify future predictions to improve the accuracy of the predictions.
59. The method according to any one of claims 45-58, wherein the advertising prediction service is implemented in AWS Lambda, or includes AWS Lambda.
60. The method according to any one of claims 45-59, wherein the advertising prediction inventory DAG is implemented in Apache Druid, or includes Apache Druid.
61. The method of any one of claims 45-60, wherein the advertising inventory data comprises data extracted from a table including events recorded by the delivery service when the delivery service receives a request from the front end; extracting the location requested in the delivery service and the (e.g., flight) pricing service (FPS) session ID associated with the request; and then using the latter to connect with (e.g., flight) search pricing table to extract the search criteria parameters.
62. The method of any one of claims 45-61, wherein the advertising inventory data comprises data extracted from a table, the table comprising searches (e.g., flights) for specific web pages; the table contains (e.g., all) data related to the search alignment criteria and suppliers examined for trips (e.g., flights); the table is used to extract the search alignment criteria related to events in the table, the events including events recorded by the delivery service provider when the delivery service provider receives a request from the front end (e.g., search trip count), and to obtain (e.g., all) suppliers with trips (e.g., supplier search trips) for the alignment criteria.
63. The method according to any one of claims 45-62, wherein the advertising inventory data comprises data extracted from a table including advertising events (e.g., impressions, clicks, views), said data being unified in a single entry and combined with search data related to requests made to the delivery service (e.g., via an FPS table).
64. The method according to any one of claims 45-63, wherein the advertising inventory data includes data extracted from a table containing geolocation information.
65. The method according to any one of claims 45-64, wherein the advertising inventory data is stored in an object container in Apache Parquet format, such as a prediction bucket, for example, an S3 bucket.
66. The method of claim 65, wherein the advertising inventory stored in the object container The data is consumed by the Apache Airflow Druid operator and then ingested within the Apache Druid cluster.
67. The method according to any one of claims 45-66, wherein the platform uses a reporting application in a DAG in the communication channel to report task failures.
68. The method according to any one of claims 45-67, wherein the platform uses a reporting application to report any task failure in the DAG in the communication channel.
69. A system configured to perform the method of any one of claims 45-68.