Intelligent exploration for digital content items

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By clustering new digital content items based on compressed feature embeddings and selecting items near previously served content with known engagement, the system addresses the inefficiencies of random selection, reducing costs and improving performance prediction accuracy.

WO2026127950A1PCT designated stage Publication Date: 2026-06-18GOOGLE LLC

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: WO · WO
Patent Type: Applications
Current Assignee / Owner: GOOGLE LLC
Filing Date: 2024-12-10
Publication Date: 2026-06-18

Smart Images

Figure US2024059302_18062026_PF_FP_ABST

Patent Text Reader

Abstract

Systems and methods for identifying digital content items that are similar to previously served digital content items. Due to the similarity of digital content items, digital content items may be predicted to perform similarly to the previously served digital content items. The similarities may be determined based on embedding distances between digital content items. Embedding values associated with features of the digital content items may be determined and used to cluster digital content items based on similarities. The features include text, audio, images, video, motion, etc. In response to a publisher's request for digital content, a previously served digital content item responsive to the request may be identified and a cluster with the previously served digital content at the center is also identified. Digital content from the cluster, other than the previously served digital content item, may be selected for auction and potentially served in response to the publisher's request.

Need to check novelty before this filing date? Find Prior Art

Description

GOOGLE-4295INTELLIGENT EXPLORATION FOR DIGITAL CONTENT ITEMSBACKGROUND

[0001] Digital content items to be served in response to a publisher’s request for digital content are typically selected based on past performance with respect to user engagement with the digital content items. However, when new digital content items are created, the new digital content items have no past performance and, hence, no performance data. To obtain performance data, the new digital content items are randomly selected to be input into the auction and potentially served to a user computing device in response to the publisher’ s request. Such randomization comes with costs - a cost of seeding data for future modeling to identify and select digital content items that result in better user engagement, a cost in quality of digital content items potentially being served, a cost of computational resources for exploring how new digital content items will perform.BRIEF SUMMARY

[0002] The technology is generally directed to identifying digital content items that are similar to previously served digital content items. Due to the similarity of digital content items, digital content items may be predicted to perform similarly to the previously served digital content items. The similarities may be determined based on embedding distances between digital content items. For example, embedding values associated with features of the digital content items may be determined and used to cluster digital content items based on similarities. The features include text, audio, images, video, motion, etc. In response to a publisher’ s request for digital content, a previously served digital content item responsive to the request may be identified and a cluster with the previously served digital content at the center is also identified. Digital content from the cluster, other than the previously served digital content item, may be selected for auction and potentially served in response to the publisher’s request.

[0003] Implementations of the present technology can each include, but are not limited to, the following. The features may be alone or in combination with one or more other features described herein.

[0004] One implementation of the technology is generally directed to a method comprising generating, by one or more processors based on feature embeddings of each of a plurality of digital content items, clusters of digital content items, wherein a center of a respective cluster is a previously served digital content item, receiving, by the one or more processors, from a publisher, a request for a least one digital content item, identifying, by the one or more processors, at least one previously served digital content item of the plurality of digital content items responsive to the request, identifying, by the one or more processors, at least one cluster of digital content items associated with the at least one previously served digital content item, and selecting, by the one or more processors, from the identified at least one cluster, a digital content item other than the identified at least one previously served digital content item.

[0005] The previously served digital content item may correspond to a digital content item having a threshold amount of ground truth engagement data. The ground truth engagement data may comprise at least one of a user selection of the digital content item, a threshold period of time the digital content item was viewed, a click through rate associated with the digital content item, or a conversion rate associated with the digital content item. The digital content item other than the identified at least one previously servedGOOGLE-4295 digital content item may correspond to a digital content item having less than the threshold amount of ground truth engagement data.

[0006] Generating the feature embeddings may further comprise identifying, by the one or more processors, one or more features of each of the plurality of digital content items, the one or more features comprising at least one of image, text, video, motion, or audio, determining, by the one or more processors, for each feature of each digital content item, a feature embedding value, and compressing, by the one or more processors, for each digital content item, the feature embedding values into a compressed embedding representation. Compressing the feature embedding values may further comprise encoding, by the one or more processors, the feature embedding values, concatenating, by the one or more processors, the encoded feature embedding values, compressing, by the one or more processors, the concatenated encoded feature embedding values, and decoding, by the one or more processors, the compressed concatenated encoded feature embedding values.

[0007] The method may further comprise providing, by the one or more processors, the selected digital content item to auction. When the selected digital content item wins the auction, the method may further comprise serving, by the one or more processors, the selected digital content item to one or more user computing devices, and receiving, by the one or more processors, ground truth engagement data associated with the selected digital content item.

[0008] The at least one previously served digital content item may be at a center of the identified at least one cluster of digital content items. The digital content item other than the identified at least one previously served digital content item may be within a threshold embedding distance from the center of the cluster.

[0009] Other implementations include corresponding computer systems, apparatus, and computer programs recorded on one or more computer storage devices, each configured to perform the actions of the methods.BRIEF DESCRIPTION OF THE DRAWINGS

[0010] Figure 1 is a block diagram of an example system, according to aspects of the disclosure.

[0011] Figure 2 is a flow diagram of generating feature embeddings, according to aspects of the disclosure.

[0012] Figure 3 is a flow diagram for identifying new digital content items, which may be performed using the system of Figure 1, according to aspects of the disclosure.

[0013] Figure 4 is another example flow diagram for identifying new digital content items, which may be performed using the system of Figure 1, according to aspects of the disclosure.

[0014] Figure 5 is a block diagram of an example computing environment, according to aspects of the disclosure.DETAILED DESCRIPTION

[0015] Aspects of the disclosure relate to a system for identifying new digital content items that are predicted to perform comparably to similar content items that have been previously served to user computing devices. New digital content items include digital content items that have little to no ground truth data. For example, new digital content items may be newly generated digital content, digital content that was submitted to auction but has not won at auction, digital content that does not have enough groundGOOGLE-4295 truth data to predict how it will perform once served to user computing devices, or the like. In contrast, previously served digital content items include digital content items that have been previously served to user computing devices such that the previously served digital content items have generated enough ground truth data based on user interaction with the digital content. The ground truth data may include, for example, user engagement and / or interactions with the previously served digital content items, such as whether the system receives a user input corresponding to the selection of the digital content item, a threshold period of time the digital content item is displayed and, therefore, viewed, a click through rate associated with the digital content item, a conversion rate associated with the digital content item, etc. (collectively “ground truth engagement data”). The ground truth engagement data may be obtained after a digital content item has won at auction and has been served.L0016J The new digital content may be identified based on similarities between the new digital content and previously served digital content. The similarities are determined based on embeddings for different features of the digital content items. For example, the features of the digital content items, whether new or previously served, can be identified. The features include, for example, image, text, audio, video, motion, etc. In some examples, the features may be identified by an embeddings generator. In another example, the digital content item may be generated by a system component; as such, the features of the digital content may be known. For example, if an artificial intelligence (Al) model was used to generate the digital content item, the system would know the inputs provided to the model and, therefore, the different features associated with the digital content item.

[0017] Each feature can have different dimensions, e.g., a different number of embedding factors or values. Concatenating the raw embedding values for each feature can unfairly emphasize larger features, potentially skewing the similarity comparison. Further, concatenating the raw embedding values would result in a large concatenated embedding value, making it computationally expensive for clustering and similarity determinations. To reduce the computational resources required to cluster the digital content and to determine the similarity between the new digital content and previously served digital content, the embeddings may be compressed, or bottlenecked. For example, an autoencoder may receive, as input, the embeddings for each feature of the digital content. The autoencoder can combine the embeddings into a concatenated embedding and then compress the concatenated embedding into a compressed representation. The compressed representation can be expanded into the different features, or inputs, corresponding to a compressed representation of each input feature.

[0018] The compressed feature representations can be used to identify new digital content that is similar to digital content that has been previously served and has enough ground truth engagement data. In some examples, the similarity may be based on the presentation of the new and previously served digital content items (“presentational similarity”). Presentational similarity can include, for example, a similarity among any combination of the look, mood, color scheme, products, text, target audience, overall appearance, target audience, audio, or the like. By identifying new digital content that is presentationally similar to previously served digital content, a campaign management platform can predict that the new digital content is likely to perform similarly to the previously served digital content. The prediction regarding the performance ofGOOGLE-4295 the new digital content is based on ground truth engagement data associated with the previously served digital content that the new digital content is similar to.

[0019] According to some examples, the new digital content that is similar to the previously served digital content is identified based on the embeddings associated with the features of the respective new and previously served digital content. The embeddings provide an indication of the presentation, appearance, look, feel, mood, etc. of the digital content. The embeddings may be used to cluster the digital content. The clusters may be for digital content that is presentationally similar. For example, a predetermined embedding distance from the previously served digital content may be used to identify new digital content that is presentationally similar to the previously served digital content. The new digital content that is within a given cluster, e.g., within the predetermined embedding distance of the previously served digital content, is identified as presentationally similar to the previously served digital content (“similar digital content”).

[0020] In response to a request for digital content to be provided for output via a publisher’s website or mobile application, the campaign management platform identifies digital content for auction. In some examples, the campaign management platform identifies digital content items responsive to the request. The identified digital content items can include previously served digital content items and / or new digital content items. By identifying both previously served and / or new digital content items, the systems and methods described herein solve the bootstrapping problem for newly created digital content items. In examples where a previously served digital content item is selected for auction, rather than providing the previously served digital content to auction, the campaign management platform identifies a cluster of digital content items with the identified previously served digital content item at the center of the cluster. The campaign management platform identifies, or selects, a similar digital content, e.g., the new digital content that is similar to previously served digital content. The similar digital content is then sent to auction. The similar digital content item is selected for auction based on its presentational similarity to the given previously served digital content item and its predicted performance at auction and once served.

[0021] New digital content items have little to no history regarding the performance of the digital content items when served. The error rate of the models used to predict the performance of digital content items and, therefore, the models for selecting the digital content items for auction, is high for new digital content items. To decrease the error rate of the models, the systems and methods described herein use intelligent clustering of digital content items based on features embeddings and a selection of a new digital content item based on threshold distance from the center of a given cluster. The cost to explore, e.g., select the new digital content items for auction, is high, as the error for the model’s prediction on how the new digital content item will perform at auction and, in some examples, once served, is high. To reduce the cost of exploration, the predicted performance of new digital content items is based on previously served digital content items. The previously served digital content items are used as a center of a given cluster such that a new digital content item within a threshold distance of the center is selected to be provided for auction. For example, the predicted performance of the new digital content item corresponds to the ground truth engagement data of the previously served digital content item, such that the costs associated with exploration of providing the new digital content item to auction is low. In particular, the exploration costsGOOGLE-4295 of new digital content items is low as the campaign management platform can predict and, therefore, expect, that the new digital content item will perform similar to the previously served digital content item based on its presentational similarity. Accordingly, new ground truth engagement data can be expected from the new digital content item, thereby building the training data for selecting digital content items for auction, for serving digital content items, and the like. The new ground truth engagement data obtained from the new digital content items may be used to update the model, such that the new digital content items may transition to previously served digital content items once a threshold amount of ground truth data is obtained. In this regard, the clusters of digital content items may be automatically updated based on the change from new to previously served digital content items. Further, the ground truth data associated with the new digital content items turned previously served content items may be used to update the model to identify digital content items responsive to a request.

[0022] Previously, new digital content items that were recently generated, created, or provided as part of a digital content campaign did not have any or enough ground truth engagement data associated with the digital content to predict how the new digital content would perform at auction and / or once served to user computing devices. Such engagement data is used by a digital content server to predict how the digital content will perform when provided for output via a publisher’s mobile application or website, e.g., whether the digital content will engage the user, provide a return to the digital content creator, prevent a loss of online traffic to the mobile application or website, etc. Testing the performance of each digital content is expensive, both computationally and monetarily. In particular, to test new digital content items, the new digital content items are randomly selected to be provided to the digital content server as part of an auction. If the new digital content wins and is served, any user engagement with the digital content may be used as training data for future digital content items. However, such random selection comes at the cost of potentially losing user engagement data, e.g., training data, should the new digital content not be selected to be served. Further, running shadow campaigns for new digital content items to explore the new digital content items makes it complex to obtain ground truth data for the new digital content items without impacting the performance of campaigns.

[0023] As an example, if there are four digital content items, of which one has been previously served while the other three digital content items are new digital content items, all four digital content items may go into exploration such that they all have the potential to be selected for auction in response to a publisher’ s request. In such an example, each digital content item has a 25% chance of being selected for auction. However, as three of the four digital content items are new digital content items, the cost of losing user engagement data is high as there is a three-in-four chance (i.e., a 75% chance) that a new digital content item is selected for auction. If a new digital content item does not win at auction and, therefore, no new digital content item is served to a user computing device in response to the publisher’s request. If the new digital content item does not win at auction, no training data was generated that can be used to improve the models for selecting digital content items for auction and / or auction bidding models.

[0024] To reduce the costs associated with exploring the performance of new digital content items, embeddings associated with the new digital content items may be determined. The embeddings for theGOOGLE-4295 digital content items may be determined feature-by-feature. The features can include, for example, music, images, and text. As the embeddings for each feature can have thousands of float values, the computational costs of using the raw embeddings is high. The number of float values correspond to a length of the vector, e.g., embedding. To reduce the computational costs and increase computational efficiency, the embeddings for each feature may be compressed into a smaller float value. By compressing the embeddings into a smaller float value, the computational efficiency is increased as there are fewer computations to perform as compared to performing computations based on all the float values associated with the raw embeddings.

[0025] The compressed embedding value or representation can be used to cluster digital content items. The clusters may be indicative of presentational similarities between digital assets. Presentational similarities may include similar music, context, text, keywords, color schemes, feelings, moods, etc. When selecting digital content items for auction, a previously served digital content item responsive to the publisher’s request may be identified. The previously served digital content item can be used as the center of a cluster. New and / or previously served digital content items within a threshold distance of the center of the cluster can be identified. The new and / or previously served digital content items within the threshold distance may be digital content items that are similar to the given previously served digital content item such that the digital content items within the cluster can be expected to perform similarly to the given previously served digital content item. By identifying similar digital content items to the given previously served digital content item, computational resources are decreased as the cost of providing new digital content items is decreased. In particular, by identifying digital content items that are similar to the previously served digital content item, the similar digital content items can be expected to perform comparably to the previously served digital content item. Therefore, the probability, or likelihood, that training data is generated is greater as compared to randomly selecting a new digital content item to be provided to auction.

[0026] Further, as digital content items can be generated quickly and in large volumes, such as by using generative artificial intelligence (Al) models, the cost of exploring the performance of the new digital content items based on similarity to previously served digital content items is decreased as compared to other methods of exploring new digital content items, e.g., randomly selecting new digital content items for auction. In particular, as the new digital content items can be efficiently clustered with previously served digital content items based on the compressed embedding value, the costs associated with identifying new digital content items for auction are decreased. For example, as ground truth engagement data is known for the previously served digital content item, the system can expect that the similar digital content items will perform similarly and, therefore, sending the similar digital content item to auction rather than the previously served digital content item will not result in a loss of training data.Example Systems

[0027] Figure 1 is a block diagram of an example system 100 including a campaign management platform 150 in communication with an embeddings generation system 110, according to aspects of the disclosure. In some examples, the campaign management platform 150 and the embeddings generation system 110 can be part of a larger system, while in other examples, the embeddings generation system 110 and theGOOGLE-4295 campaign management platform 150 are implemented on separate devices in one or more physical locations.

[0028] The campaign management platform 150 and the embeddings generation system 110 can be in communication over a network. The campaign management platform 150 may be configured to manage the serving of content to user computing devices, such as user computing devices 180A-C, and provide a user interface for doing so. For example, the user interface can be configured as a web interface, an API, a standalone software application, etc., for organizing and causing digital content to be served to different user computing devices in accordance with different targeting parameters.

[0029] Content delivery may be organized as one or more campaigns, each campaign logically associated with some subject digital content. Campaigns may be further subdivided into groups, representing potential variations on the type of content to be served. Groups may be further subdivided into line items, representing even more specificity in the digital content to be served, the time at which to serve the content, and / or the computing devices that are a target of the content. The time at which to serve the content corresponds to the flight for the content. Digital content, the period of time at which the digital content is to be served to different user computing devices, and / or targeting parameters for selecting which user computing devices to serve the content to may be selected at either the campaign, group, or line item level.

[0030] After the campaign management platform 150 receives the content, e.g., digital content items 170A-C, a flight for the content, and targeting parameters for the computing devices to serve the content, the campaign management platform 150 is configured to serve the content to the user computing devices. In some examples a separate component, e.g., a separate engine running on the same device or different devices than the platform, may cause the content to be served, e.g., by directly serving the content, or sending a request or command to another system configured to serve the content. The flight may be as short as the time it takes to send the content to the user computing devices. In other examples, the flight may be any length of time, such as hours, days, weeks, and so on. Serving the content can include sending the content over a network to be displayed or outputted by the devices, or causing content stored on the user computing devices to be displayed or otherwise outputted.

[0031] The embeddings generation system 110 includes embeddings generator 120, digital content repository 140, embeddings fetcher 105, and embeddings repository 115, which can be implemented, in different examples, on one or more computing devices in one or more physical locations. The embeddings generator 120 is configured to receive the digital content items 170A-C. The digital content items 170A-C can be images, text, video, audio, or the like. In some examples, the digital content items 170A-C can be informative information, entertainment, advertisements, etc.

[0032] Figure 2 is an example embeddings generator 120. The embeddings generator 120 is configured to identify and then compress the input feature embeddings, e.g., the embeddings associated with the features of the digital content. Compressing the feature embeddings results in a smaller dimensional space, as compared to non-compressed embeddings. For example, digital content can include different features, such as text, images, video, audio, motion, etc. Each feature is associated with embeddings that represent the features (“feature embeddings”). According to some examples, the feature embeddings may at leastGOOGLE-4295 partially encode some semantic meaning for the features of the digital content. In some examples, the feature embeddings may be a representation of the style, personality, and / or characteristics of the feature(s) of the digital content.

[0033] Referring back to Figure 1, the feature embeddings may be identified via embeddings fetcher 105. For example, the embeddings fetcher 105 may access an embeddings repository 115. The embeddings repository 115 may store image embeddings to be used to represent the features of the digital content items 107 A-C. According to some examples, the embeddings fetcher 105 may query the embeddings repository 115 to identify one or more relevant feature embeddings of the digital content items 170A-C. Relevant feature embeddings may be, for example, embeddings corresponding to the text, image, video, audio, etc., within the digital content items 170A-C. According to some examples, the relevant feature embeddings may be embeddings within a threshold distance of the features. For example, for the text feature, the words within the digital content item may correspond to a given feature embedding. Relevant feature embeddings may be embeddings within a threshold distance of than embedding associated with the words within the digital content item.

[0034] In some examples, rather than using the embeddings fetcher 105 as an intermediary, the embeddings generator 120 may be configured to query the embeddings repository 115 to identify relevant feature embeddings. In some examples, embeddings fetcher 105 uses clustering techniques to find the nearest matches.

[0035] As another example, the embeddings fetcher 105 can generate embeddings instead of retrieving embeddings from a pre -populated repository. For example, the embeddings fetcher 105 can implement an Al model trained to receive the digital content items 170A-C and / or features of the digital content items 170A-C and generate feature embeddings from input. In some examples, the embeddings fetcher 105 may be trained to output feature embeddings directly from the digital content items 170A-C and / or features of the digital content items 170A-C, e.g., by being trained end-to-end on inputs labeled with corresponding feature embeddings.

[0036] Referring back to Figure 2, the feature embeddings for each feature, e.g., Features 1-3, may have different dimensions. As an example, text embeddings may have a float value of 1280, image embeddings may have a float value of 1024, and audio embeddings may have a float value of 2048. Concatenating the feature embeddings can unfairly emphasize the larger features, e.g., audio embeddings. Emphasizing a given feature over another may lead to improper clustering of the digital content. For example, rather than clustering the new and previously served digital content based on the whole presentation of the digital content, an emphasis on a given feature can cause the clusters to be based on a given feature. The presentation of the digital content as a whole may include, for example, overall style (modern, moody, classic, stream-lined, wordy, colorful, abstract, or the like), visual similarities (presence of certain colors, shapes, objects, features, characteristics, text, or the like), text (content, product, descriptions, font, target audience, or the like), etc. Further, concatenating the original, or raw, feature embeddings would result in a very large embedding, which would result in more computational resources being required to cluster the digital content and identify similar digital content.GOOGLE-4295

[0037] According to some examples, the embeddings generator 120 may include one or more towers. Each tower is for a respective feature. For each tower, the embeddings generator 120 may include a flattening layer and / or one or more dense layers before the feature embeddings for each tower are concatenated. In some examples, such as for Feature 3, the tower can include a masking layer, or step. The masking step may occur when the feature embedding, or representation, is of different lengths across different digital content items. The digital content items may be, for example, video or image items. In examples where the feature embeddings for the digital content items are of different lengths, the individual embeddings may be padded such that all of the embeddings are of the same length. During the training stage, the padding values may be masked, e.g., masked from training loss computation, as padded values may be used for training the models but may not represent the digital content item attributes.

[0038] To reduce the size of the feature embeddings for the digital content, the embeddings generator 120 flattens the input data into a ID vector. From the flattened layer, the embeddings generator 120 extracts features and reduces the dimensionality of the feature embeddings. As an example, the dense layers for each feature may reduce the embeddings to 512 and then 256 units. The reduced feature embeddings , e.g., the 256 units, may be concatenated such that the reduced feature embeddings is combined into a concatenated embedding. A series of dense layers may further compress the concatenated embedding into a smaller dimensional latent representation. In some examples, the compressed concatenated embedding is a 64-dimensional latent representation. This may, in some examples, be referred to as a bottleneck 220 of the embeddings generator 120. The dense layers may include Rectified Linear Units (Re LU) activation. ReLU activations are nonlinear activations applied within a neural network computation structure. In some examples, rather than ReLU activations, the dense layers may include Exponential Linear Units (ELU) activations.

[0039] After the bottleneck 220, the embeddings generator 120 may have a mirrored structure 226 of the compression layers 224. In some examples, the mirrored structure 226 may include ELU activation to expand the latent representation. For example, the embeddings generator 120 may include a decoder configured to expand the 64-dimensional representation to a representation having 256 units, 512 units, 1024 units, etc. The final layer, e.g., dense layer 222, may include sigmoid activation. The sigmoid activation outputs a decoded vector with a size matching the combined size of all three inputs. For example, the decoded vector output by the sigmoid activation has a size corresponding to the combined size of the inputs of the embeddings generator 120. The decoded vector may be split into different outputs, e.g., Decoded 1, Decoded 2, Decoded 3, corresponding to the original input shapes of Features 1, Features 2, Features 3. The output of the embeddings generator 120 results in an embedding representation of the new and / or previously served digital content that can be used to quickly and efficiently cluster digital content and identify similar digital content.

[0040] Referring back to Figure 1, the output of the embeddings generator 120 may be associated with the digital content items 170A-C and stored in digital content repository 140. In some examples, the embeddings generation system 110 may cluster the digital content items within the digital content repository 140 based on similarity between digital content items. For example, based on the embeddingsGOOGLE-4295 for each feature of the digital content, the embeddings generation system 110 may cluster the new and previously served digital content such that digital content items that are similar in their presentation are clustered together.

[0041] The clusters may be centered around previously served digital content. For example, a previously served digital content item may be at the center of a given cluster and any digital content items, whether new or previously served, within a threshold embeddings distance of the previously served digital content item may be included within the cluster. The clusters may be defined based on a threshold, or predetermined, embedding distance with the previously served digital content item at the center. The threshold embedding distance may be with respect to multiple dimensions, or features, associated with the digital content items. For example, each dimension may correspond to a given feature and the respective feature embedding values. In such an example, the more features, e.g., text, image, music, audio, motion, etc., in a given digital content item, the more that are used to cluster the digital content items.

[0042] According to some examples, the threshold embedding distance may be determined based on the distribution of the distance between feature embedding values of the digital content items. In some examples, a percentile, median value, average, or the like may be used as the threshold embedding distance. In another example, an unsupervised cluster algorithm may be used to identify embedding values that are close in space and drawing boundaries corresponding to the threshold distance. In yet another example, hierarchical clustering may be used to cluster the digital content items.Example Methods

[0043] Figure 3 is a flow diagram of an example process for identifying new digital content that is similar to previously served digital content, according to aspects of the disclosure. The following operations do not have to be performed in the precise order described below. Rather, various operations can be handled in a different order or simultaneously, and operations may be added or omitted.

[0044] In block 334, a digital content creator 332 generates new digital content items. The digital content creator 332 may be, for example, a merchant, advertiser, publisher, or the like. In some examples, the digital content creator 332 may be a model, such as a generative Al model, trained to generate digital content items based on prompts provided by the merchant, advertiser, publisher, etc. The generated digital content items are transmitted to the embeddings generation system 110.

[0045] In block 336, the embeddings generation system 110 generates feature embeddings for the digital content items. Each digital content item received by the embeddings generation system 110 is composed of different features, such as text, images, video, audio, motion, etc. Embeddings for each feature are then generated, or determined, by the embeddings generation system 110. For example, the feature embeddings may be identified via an embeddings fetcher, such as embeddings fetcher 105. The embeddings fetcher access, or query, an embeddings repository 115 to identify relevant feature embeddings associated with the digital content item. In some examples, rather than using an embedding fetcher as an intermediary, the embeddings generator 120 may query the embedding repository. In another example, the embeddings fetcher 105 and / or the embeddings generation system 110 uses clustering techniques associated withGOOGLE-4295 feature embeddings to find the nearest matches and, therefore, feature embeddings for the digital content item.

[0046] The feature embeddings for the digital content item are provided as input into the embeddings generator 120 to reduce the size of the feature embeddings and, therefore, the overall processing power needed to cluster the digital content items and identify digital content items in response to a publisher’ s request. For example, the embeddings generator 120 can flatten the feature embeddings, e.g., flatten layer in Figure 2. Flattening the feature embeddings may include, for example, flattening the feature embeddings into a 1 -dimensional vector. From the flattened layer, the embeddings generator 120 extracts features and reduces the dimensionality of the feature embeddings. The reduced feature embeddings are concatenated, or combined, into a concatenated embedding. The concatenated embedding is further reduced, or compressed, into a smaller dimensional latent representation, e.g.. a bottleneck of the embeddings generator 120. The bottleneck may be expanded such that an output of the embeddings generator is a vector with a size matching the combined size of the feature embeddings provided as input. The vector may be a decoded vector. The output vector is split into different outputs, corresponding to a decoded representation of each feature of the digital content item. The output of the embeddings generator 120 provides for a smaller, but still accurate, representation of each feature of the digital content item. The smaller size of the embedding representation allows for the embeddings generation system 1 10 to efficiently cluster digital content items and store the representations using less memory. Further, the smaller size of the embedding representation and resulting clusters allows for the campaign management platform 150 to identify relevant digital content items, including new digital content items that are similar to previously served digital content items, efficiently, e.g., in real time, in response to a publisher’s request.

[0047] In block 338, the smaller embedding representations of the features output by the embeddings generator 120 are used to cluster digital content items. The clusters may include new digital content items and previously served digital content items. For example, a center of a cluster may be a previously served digital content item while the remainder of the cluster is comprised of both new and / or previously served digital content items. A cluster may include digital content items that are similar in presentation to one another. The similarity is based on a threshold embedding distance for the features within the digital content items.

[0048] While block 338, e.g., the clustering of digital content items, is shown in Figure 3 as occurring before a request for digital items is received from the publisher 330, the clustering may occur after the request for digital content items is received and / or after the campaign management platform 150 identifies digital content items responsive the request.

[0049] In block 340, a publisher 330 may transmit a request for digital content items to the campaign management platform 150. In response to the request, in block 342, the campaign management platform 150 identifies responsive digital content items. Responsive digital content items may include, for example, digital content items that fulfill the request of the publisher 330, have a predicted performance value above a threshold, are likely to be engaged with by the user viewing the publisher’s website and / or mobile application, etc.GOOGLE-4295

[0050] According to some examples, the responsive digital content items include previously served digital content items. The identified previously served digital content items are used by the campaign management platform 150 to identify new digital content items that are similar to the identified previously served digital content item. For example, a cluster of digital content items in which the identified previously served digital content item is at the center may be identified. In some examples, the cluster may be generated in response to identifying the responsive previously served digital content item. From the cluster, a new digital content item that is similar to the previously served digital content item may be selected by the campaign management platform and provided to the auction, e.g., in block 344.

[0051] By selecting a new digital content item from the cluster based on the previously served digital content item, the campaign management platform 150 can predict the performance, such as the predicted ground truth user engagement data, associated with the new digital content item. This allows for the campaign management platform 150 to intelligently, e.g., based on presentational similarities, explore the actual performance of new digital content items without sacrificing costs. For example, rather than randomly selecting a new digital content item to be provided to the auction, the campaign management platform 150 an identify a presentationally similar digital content item, e.g., a new digital content item, that is expected to perform similarly to a previously served digital content item that would have been selected but for there being new digital content items. The exploration costs associated with selecting the new digital content item is lowered as the likelihood of success at auction can be predicted based on the ground truth engagement data associated with the presentationally similar previously served digital content item. Further, the exploration costs are lowered by lowering the computational resources required for identifying the new digital content item. In particular, by compressing the embedding size and generating clusters based on presentational similarity, the memory required to store the embeddings and group similar digital content is reduced.

[0052] In block 344, the new digital content item that is similar to the identified previously served digital content item is provided to the auction. The outcome of the auction determines which digital content item to serve to the publisher 330, in block 346. In block 348, the publisher 330 publishes the digital content item via the website and / or mobile application of the publisher. Publishing the digital content item includes, for example, providing the digital content item for output via a user interface such that a user can view, interact, and / or engage with the digital content item. Any user engagement with the digital content item is transmitted to campaign management platform 150 as ground truth engagement data, in block 350. The ground truth engagement data, whether it is for a new digital content item or a previously served digital content item, is used as training data for models trained to predict the performance of a digital content item at auction and / or once served. Accordingly, in examples where the new digital content item wins the auction, in block 344, the exploration of the new digital content item is successful in that the new digital content item generated ground truth engagement data.

[0053] Figure 4 is a flow diagram for an example method 400 of identifying new digital content items that are predicted to perform similarly to content items that have been previously served to user computing devices. The following operations do not have to be performed in the precise order described below. Rather,GOOGLE-4295 various operations can be handled in a different order or simultaneously, and operations may be added or omitted.

[0054] In block 410, clusters of digital content items are generated based on the feature embeddings of each of a plurality of digital content items. For example, a plurality of digital content items are received. The digital content items can be images, text, video, audio, or the like. In some examples, the digital content items can be informative information, entertainment, advertisements, etc. Feature embeddings associated with the plurality of digital content items are generated. Generating the feature embedding may include, for example, identifying one or more features of each of the plurality of digital content items. The features may include image, text, audio, video, motion, etc. For each feature, a feature embedding value is determined. The feature embedding values may be compressed into a compressed embedding representation.

[0055] Compressing the feature embedding values includes, for example, encoding the feature embedding values. For example, each feature embedding value may have a large value for the raw feature embedding value. The feature embedding values for each feature may be different, such that, without processing the feature embedding values, one feature may be considered more heavily when determining similar digital content items. By flattening the feature embedding values, features of the digital content can be extracted such that the dense layers for each feature can reduce the feature embedding value for each feature. The reduced feature embedding values may be concatenated and the concatenated encoded feature embedding values may be reduced further, e.g., through additional dense layers, until reaching a bottleneck. After the bottleneck, the compressed feature embedding values may be decoded and split into compressed embedding representations for each feature.

[0056] A center of a respective cluster is a previously served digital content item. The previously served digital content item corresponds to a digital content item having a threshold amount of ground truth engagement data. Ground truth engagement data includes, for example, a user selection of the digital content item, a threshold period of time the digital content item was viewed, a click through rate associated with the digital content item, or a conversion rate associated with the digital content item. According to some examples, ground truth engagement data corresponds to the performance of a digital content item at auction and / or once served to a publisher for display via the website or mobile application of the publisher. In another example, ground truth engagement data is used to predict the performance of a digital content item.

[0057] In block 420, a request for at least one digital content item is received from a publisher.

[0058] In block 430, at least one previously served digital content item of the plurality of digital content items responsive to the request is identified.

[0059] In block 440, at least one cluster of digital content items associated with the at least one previously served digital content item is identified. The at least one previously served digital content item is at a center of the identified at least one cluster.

[0060] In block 450, a digital content item other than the identified at least one previously served digital content item is selected. The digital content item other than the identified at least one previously servedGOOGLE-4295 digital content item corresponds to a digital content item having less than the threshold amount of ground truth engagement data. For example, the other digital content item may be a newly generated digital content item, a digital content item that has not been put to auction, a digital content item that has not won at auction, or the like.

[0061] According to some examples, the selected digital content item is provided to auction. In such an example, the selected digital content item has the opportunity to be selected as the responsive digital content item to the publisher’s request. The auction may be determined based on digital content campaign information associated with the digital content. For example, the digital content campaign information may include bidding information, a target audience, etc. In examples where the selected digital content item wins the auction, the selected digital content item may be served to one or more user computing devices in response to the publisher’ s request. Once the digital content item is served, ground truth data associated with the selected digital content item may be received.Example Computing Environment

[0062] Figure 5 is a block diagram of an example computing environment 500 in which the features described above may be implemented. It should not be considered limiting the scope of the disclosure or usefulness of the features described herein. In this example, computing environment 500 may include device(s) 505, server computing device 530, storage system 540, and network 550.

[0063] Aspects of the disclosure can be implemented in a computing system that includes a back-end component, e.g., as a data server, a middleware component, e.g., an application server, or a front-end component, e.g., user computing device 505 having a user interface, a web browser, or an app, or any combination thereof. The components of the system can be interconnected by any form or medium of digital data communication, such as a communication network. Examples of communication networks include a local area network (LAN) and a wide area network (WAN), e.g., the Internet.

[0064] The computing environment 500 can include clients, e.g., user computing device 505 and servers, e.g., server computing device 530. A client and server can be remote from each other and interact through a communication network. The relationship of client and server arises by virtue of the computer programs running on the respective computers and having a client-server relationship to each other. For example, a server can transmit data, e.g., an HTML page, to a client device, e.g., for purposes of displaying data to and receiving user input from a user interacting with the client device. Data generated at the client device, e.g., a result of the user interaction, can be received at the server from the client device.

[0065] Each device 505 may be a personal computing device intended for use by a respective user. The device 505 may include one or more processors 535, memory 545, data 565 and instructions 555. Each device 505 may also include an output 575 and user input 585. By way of example only, devices 505 may be mobile phones or devices such as a wireless-enabled PDA, smartphones, a tablet PC, desktop computing device, a wearable computing device (e.g., a smartwatch, AR / VR headset, smart helmet, etc.), a netbook that is capable of obtaining information via the Internet or other networks, or a smart home device, such as a home assistant, smart thermostat, smart doorbell, smart light, etc.GOOGLE-4295

[0066] Memory 545 of device 505 may store information that is accessible by processor 535. Memory 545 may also include data that can be retrieved, manipulated or stored by the processor 535. The memory 545 may be of any non-transitory type capable of storing information accessible by the processor 535, including a non-transitory computer-readable medium, or other medium that stores data that may be read with the aid of an electronic device, such as a hard-drive, memory card, read-only memory (“ROM”), random access memory (“RAM”), optical disks, as well as other write-capable and read-only memories. Memory 545 may store information that is accessible by the processors 535, including instructions 555 that may be executed by processors 535, and data 565.

[0067] Data 565 may be retrieved, stored or modified by processors 535 in accordance with instructions 555. For instance, although the present disclosure is not limited by a particular data structure, the data 565 may be stored in computer registers, in a relational database as a table having a plurality of different fields and records, XML documents, or flat files. The data 565 may also be formatted in a computer-readable format such as, but not limited to, binary values, ASCII or Unicode. By further way of example only, the data 565 may comprise information sufficient to identify the relevant information, such as numbers, descriptive text, proprietary codes, pointers, references to data stored in other memories (including other network locations) or information that is used by a function to calculate the relevant data.

[0068] The instructions 555 can be any set of instructions to be executed directly, such as machine code, or indirectly, such as scripts, by the processor 535. In that regard, the terms “instructions,” “application,” “steps,” and “programs” can be used interchangeably herein. The instructions can be stored in object code format for direct processing by the processor, or in any other computing device language including scripts or collections of independent source code modules that are interpreted on demand or compiled in advance. Functions, methods and routines of the instructions are explained in more detail below.

[0069] The one or more processors 535 may include any conventional processors, such as a commercially available CPU or microprocessor. Alternatively, the processor can be a dedicated component such as an ASIC or other hardware-based processor. Although not necessary, computing devices 505 may include specialized hardware components to perform specific computing functions faster or more efficiently.

[0070] Although Figure 5 functionally illustrates the processor, memory, and other elements of devices 505 as being within the same respective blocks, components described in this specification, including the processors and the memories can include multiple processors and memories that can operate in different physical locations and not within the same computing device. For example, some of the instructions and the data can be stored on a removable SD card and others within a read-only computer chip. Some or all of the instructions and data can be stored in a location physically remote from, yet still accessible by, the processors . Similarly, the processors can include a collection of processors that can perform concurrent and / or sequential operation. The computing devices can each include one or more internal clocks providing timing information, which can be used for time measurement for operations and programs run by the computing devices.

[0071] Output 575 may be a display, such as a monitor having a screen, a touch-screen, a projector, or a television. The display 575 of the one or more computing devices 505 may electronically displayGOOGLE-4295 information to a user via a graphical user interface (“GUI”) or other types of user interfaces. For example, display 575 may electronically display digital content items.

[0072] The user input 585 may be a mouse, keyboard, touch-screen, microphone, or any other type of input.

[0073] The devices 505 can be at various nodes of a network 550 and capable of directly and indirectly communicating with other nodes of network 550. Although one device is depicted in Figure 5, it should be appreciated that a typical system can include one or more devices, with each device being at a different node of network 550. The network 550 and intervening nodes described herein can be interconnected using various protocols and systems, such that the network can be part of the Internet, World Wide Web, specific intranets, wide area networks, or local networks. The network 550 can utilize standard communications protocols, such as WiFi, Bluetooth, 4G, 5G, etc., that are proprietary to one or more companies. Although certain advantages are obtained when information is transmitted or received as noted above, other aspects of the subject matter described herein are not limited to any particular manner of transmission.

[0074] In one example, computing environment 500 may include one or more server computing devices 530 having a plurality of computing devices, e.g., a load balanced server farm, that exchange information with different nodes of a network for the purpose of receiving, processing and transmitting the data to and from other computing devices. For instance, one or more server computing devices 530 may be a web server that is capable of communicating with the one or more client computing devices 530 via the network 550. In addition, server computing device 530 may use network 550 to transmit and present information to a user of one of the other computing devices 505.

[0075] Server computing device 530 may include one or more processors, memory, instructions, data, etc. These components operate in the same or similar fashion as those described above with respect to computing device 505. The server computing device 530 may include embeddings generation system 110 and campaign management platform 150, as described above with respect to Figure 1. According to some examples, the server computing device 530 may be connected over the network to a data center 510 housing any number of hardware accelerators. The data center 510 can be one of multiple data centers or other facilities in which various types of computing devices, such as hardware accelerators, are located. Computing resources housed in the data center can be specified for repeated results monitoring, including identifying repeated query results, or the like.

[0076] The devices 505, 530 can be capable of direct and indirect communication over the network 550. The devices 505, 530 can set up listening sockets that may accept an initiating connection for sending and receiving information. The network 550 itself can include various configurations and protocols including the Internet, World Wide Web, intranets, virtual private networks, wide area networks, local networks, and private networks using communication protocols proprietary to one or more companies. The network 550 can support a variety of short- and long-range connections. The short- and long-range connections may be made over different bandwidths, such as 2.402 GHz to 2.480 GHz (commonly associated with the Bluetooth® standard), 2.4 GHz and 5 GHz (commonly associated with the Wi-Fi® communication protocol); or with a variety of communication standards, such as the LTE® standard for wireless broadbandGOOGLE-4295 communication. The network 550, in addition or alternatively, can also support wired connections between the devices 505, 530, including over various types of Ethernet connection.

[0077] The server computing device 530 can be configured to receive queries from a publisher, based on inputs received by the client computing device 505, on computing resources in the data center 510. The queries from the publisher may be a request for digital content items . For example, the environment can be part of a computing platform configured to provide a variety of services to users, through various user interfaces and / or application programming interfaces (APIs) exposing the platform services. The variety of services can include identifying content responsive to the query, or the like. As an example, storage system 540 may be configured to store embeddings repository 115. In some examples, the storage system 540 may be configured to store a repository of digital content. The digital content repository 140 may include the digital content items and the associated feature embeddings. In another example, storage system 540 may be configured to store clusters of digital content items, where the clusters are based on the feature embeddings associated with the digital content items. The storage system(s) 540 can be a combination of volatile and non-volatile memory and can be at the same or different physical locations than the computing devices 505, 550. For example, the storage system(s) 540 can include any type of non-transitory computer readable medium capable of storing information, such as a hard-drive, solid state drive, tape drive, optical storage, memory card, ROM, RAM, DVD, CD-ROM, write -capable, and read-only memories.

[0078] As other examples of potential services provided by a platform implementing the environment, the server computing device can maintain a variety of models in accordance with different constraints available at the data center. For example, the server computing device can maintain different families for deploying models on various types of TPUs and / or GPUs housed in the data center or otherwise available for processing.

[0079] Aspects of this disclosure can be implemented in digital electronic circuitry, in tangibly-embodied computer software or firmware, and / or in computer hardware, such as the structure disclosed herein, their structural equivalents, or combinations thereof. Aspects of this disclosure can further be implemented as one or more computer programs, such as one or more modules of computer program instructions encoded on a tangible non-transitory computer storage medium for execution by, or to control the operation of, one or more data processing apparatus. The computer storage medium can be a machine -readable storage device, a machine-readable storage substrate, a random or serial access memory device, or combinations thereof. The computer program instructions can be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical, or electromagnetic signal, which is generated to encode information for transmission to suitable receiver apparatus for execution by a data processing apparatus.

[0080] The term “configured” is used herein in connection with systems and computer program components. For a system of one or more computers to be configured to perform particular operations or actions means that the system has installed on its software, firmware, hardware, or a combination thereof that cause the system to perform the operations or actions. For one or more computer programs to be configured to perform particular operations or actions means that the one or more programs includeGOOGLE-4295 instructions that, when executed by one or more data processing apparatus, cause the apparatus to perform the operations or actions.

[0081] The term “data processing apparatus” refers to data processing hardware and encompasses various apparatus, devices, and machines for processing data, including programmable processors, a computer, or combinations thereof. The data processing apparatus can include special purpose logic circuitry, such as a field programmable gate array (FPGA) or an application specific integrated circuit (ASIC). The data processing apparatus can include code that creates an execution environment for computer programs, such as code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or combinations thereof.

[0082] The data processing apparatus can include special-purpose hardware accelerator units for implementing machine learning models to process common and compute-intensive parts of machine learning training or production, such as inference or workloads. Machine learning models can be implemented and deployed using one or more machine learning frameworks.

[0083] The term “computer program” refers to a program, software, a software application, an app, a module, a software module, a script, or code. The computer program can be written in any form of programming language, including compiled, interpreted, declarative, or procedural languages, or combinations thereof. The computer program can he deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. The computer program can correspond to a file in a file system and can be stored in a portion of a file that holds other programs or data, such as one or more scripts stored in a markup language document, in a single file dedicated to the program in question, or in multiple coordinated files, such as files that store one or more modules, sub programs, or portions of code. The computer program can be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a data communication network.

[0084] The term “database” refers to any collection of data. The data can be unstructured or structured in any manner. The data can be stored on one or more storage devices in one or more locations. For example, an index database can include multiple collections of data, each of which may be organized and accessed differently.

[0085] The term “engine” refers to a software-based system, subsystem, or process that is programmed to perform one or more specific functions. The engine can be implemented as one or more software modules or components, or can be installed on one or more computers in one or more locations. A particular engine can have one or more computers dedicated thereto, or multiple engines can be installed and running on the same computer or computers.

[0086] The processes and logic flows described herein can be performed by one or more computers executing one or more computer programs to perform functions by operating on input data and generating output data. The processes and logic flows can also be performed by special purpose logic circuitry, or by a combination of special purpose logic circuitry and one or more computers.GOOGLE-4295

[0087] A computer or special purposes logic circuitry executing the one or more computer programs can include a central processing unit, including general or special purpose microprocessors, for performing or executing instructions and one or more memory devices for storing the instructions and data. The central processing unit can receive instructions and data from the one or more memory devices, such as read only memory, random access memory, or combinations thereof, and can perform or execute the instructions. The computer or special purpose logic circuitry can also include, or be operatively coupled to, one or more storage devices for storing data, such as magnetic, magneto optical disks, or optical disks, for receiving data from or transferring data to. The computer or special purpose logic circuitry can be embedded in another device, such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS), or a portable storage device, e.g., a universal serial bus (USB) flash drive, as examples.

[0088] Computer readable media suitable for storing the one or more computer programs can include any form of volatile or non-volatile memory, media, or memory devices. Examples include semiconductor memory devices, e.g., EPROM, EEPROM, or flash memory devices, magnetic disks, e.g., internal hard disks or removable disks, magneto optical disks, CD-ROM disks, DVD-ROM disks, or combinations thereof.

[0089] Aspects of the disclosure can be implemented in a computing system that includes a back end component, e.g., as a data server, a middleware component, e.g., an application server, or a front end component, e.g., a client computer having a graphical user interface, a web browser, or an app, or any combination thereof. The components of the system can be interconnected by any form or medium of digital data communication, such as a communication network. Examples of communication networks include a local area network (LAN) and a wide area network (WAN), e.g., the Internet.

[0090] The computing system can include clients and servers. A client and server can be remote from each other and interact through a communication network. The relationship of client and server arises by virtue of the computer programs running on the respective computers and having a client-server relationship to each other. For example, a server can transmit data, e.g., an HTML page, to a client device, e.g., for purposes of displaying data to and receiving user input from a user interacting with the client device. Data generated at the client device, e.g., a result of the user interaction, can be received at the server from the client device.

[0091] Unless otherwise stated, the foregoing alternative examples are not mutually exclusive, but may be implemented in various combinations to achieve unique advantages. As these and other variations and combinations of the features discussed above can be utilized without departing from the subject matter defined by the claims, the foregoing description of the examples should be taken by way of illustration rather than by way of limitation of the subject matter defined by the claims. In addition, the provision of the examples described herein, as well as clauses phrased as “such as,” “including” and the like, should not be interpreted as limiting the subject matter of the claims to the specific examples; rather, the examples are intended to illustrate only one of many possible implementations. Further, the same reference numbers in different drawings can identify the same or similar elements.GOOGLE-4295

Claims

GOOGLE-4295CLAIMS1. A method, comprising: generating, by one or more processors based on feature embeddings of each of a plurality of digital content items, clusters of digital content items, wherein a center of a respective cluster is a previously served digital content item; receiving, by the one or more processors, from a publisher, a request for a least one digital content item; identifying, by the one or more processors, at least one previously served digital content item of the plurality of digital content items responsive to the request; identifying, by the one or more processors, at least one cluster of digital content items associated with the at least one previously served digital content item; and selecting, by the one or more processors, from the identified at least one cluster, a digital content item other than the identified at least one previously served digital content item.

2. The method of claim 1, wherein the previously served digital content item corresponds to a digital content item having a threshold amount of ground truth engagement data.

3. The method of claim 2, wherein the ground truth engagement data comprises at least one of a user selection of the digital content item, a threshold period of time the digital content item was viewed, a click through rate associated with the digital content item, or a conversion rate associated with the digital content item.

4. The method of claim 2 or 3, wherein the digital content item other than the identified at least one previously served digital content item corresponds to a digital content item having less than the threshold amount of ground truth engagement data.

5. The method of any of the preceding claims, wherein generating the feature embeddings comprises: identifying, by the one or more processors, one or more features of each of the plurality of digital content items, the one or more features comprising at least one of image, text, video, motion, or audio; determining, by the one or more processors, for each feature of each digital content item, a feature embedding value; and compressing, by the one or more processors, for each digital content item, the feature embedding values into a compressed embedding representation.

6. The method of claim 5, wherein compressing the feature embedding values further comprises: encoding, by the one or more processors, the feature embedding values;GOOGLE-4295 concatenating, by the one or more processors, the encoded feature embedding values; compressing, by the one or more processors, the concatenated encoded feature embedding values; and decoding, by the one or more processors, the compressed concatenated encoded feature embedding values.

7. The method of any of the preceding claims, further comprising providing, by the one or more processors, the selected digital content item to auction.

8. The method of claim 7, wherein when the selected digital content item wins the auction, the method further comprises: serving, by the one or more processors, the selected digital content item to one or more user computing devices; and receiving, by the one or more processors, ground truth engagement data associated with the selected digital content item.

9. The method of any of the preceding claims, wherein the at least one previously served digital content item is at a center of the identified at least one cluster of digital content items.

10. The method of any of the preceding claims, wherein the digital content item other than the identified at least one previously served digital content item is within a threshold embedding distance from the center of the cluster.

11. A system, comprising: one or more processors, the one or more processors configured to: generate, based on feature embeddings of each of a plurality of digital content items, clusters of digital content items, wherein a center of a respective cluster is a previously served digital content item; receive, from a publisher, a request for a least one digital content item; identify at least one previously served digital content item of the plurality of digital content items responsive to the request; identify at least one cluster of digital content items associated with the at least one previously served digital content item; and select, from the identified at least one cluster, a digital content item other than the identified at least one previously served digital content item.

12. The system of claim 11, wherein the previously served digital content item corresponds to a digital content item having a threshold amount of ground truth engagement data.GOOGLE-429513. The system of claim 12, wherein the ground truth engagement data comprises at least one of a user selection of the digital content item, a threshold period of time the digital content item was viewed, a click through rate associated with the digital content item, or a conversion rate associated with the digital content item.

14. The system of claim 12 or 13, wherein the digital content item other than the identified at least one previously served digital content item corresponds to a digital content item having less than the threshold amount of ground truth engagement data.

15. The system of any of claims 11 to 14, wherein when generating the feature embeddings, the one or more processors are further configured to: identify one or more features of each of the plurality of digital content items, the one or more features comprising at least one of image, text, video, motion, or audio; determine, for each feature of each digital content item, a feature embedding value; and compress, for each digital content item, the feature embedding values into a compressed embedding representation.

16. The system of claim 15, wherein when compressing the feature embedding values the one or more processors are further configured to: encode the feature embedding values; concatenate the encoded feature embedding values; compress the concatenated encoded feature embedding values; and decode the compressed concatenated encoded feature embedding values.

17. The system of any of claims 11 to 16, wherein the one or more processors are further configured to provide the selected digital content item to auction.

18. The system of claim 17, wherein when the selected digital content item wins the auction, the one or more processors are further configured to: serve the selected digital content item to one or more user computing devices; and receive ground truth engagement data associated with the selected digital content item.

19. The system of any of claims 11 to 18, wherein the at least one previously served digital content item is at a center of the identified at least one cluster of digital content items.GOOGLE-429520. The system of any of claims 11 to 19, wherein the digital content item other than the identified at least one previously served digital content item is within a threshold embedding distance from the center of the cluster.

21. One or more non-transitory computer-readable media for storing instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising: generating, based on feature embeddings of each of a plurality of digital content items, clusters of digital content items, wherein a center of a respective cluster is a previously served digital content item; receiving, from a publisher, a request for a least one digital content item; identifying at least one previously served digital content item of the plurality of digital content items responsive to the request; identifying at least one cluster of digital content items associated with the at least one previously served digital content item; and selecting, from the identified at least one cluster, a digital content item other than the identified at least one previously served digital content item.

22. The one or more non-transitory computer-readable media of claim 21, wherein the previously served digital content item corresponds to a digital content item having a threshold amount of ground truth engagement data.

23. The one or more non-transitory computer-readable media of claim 22, wherein the ground truth engagement data comprises at least one of a user selection of the digital content item, a threshold period of time the digital content item was viewed, a click through rate associated with the digital content item, or a conversion rate associated with the digital content item.

24. The one or more non-transitory computer-readable media of claim 22 or 23, wherein the digital content item other than the identified at least one previously served digital content item corresponds to a digital content item having less than the threshold amount of ground truth engagement data.

25. The one or more non-transitory computer-readable media of any of claims 21 to 24, wherein when generating the feature embeddings, the operations further comprise: identifying one or more features of each of the plurality of digital content items, the one or more features comprising at least one of image, text, video, motion, or audio; determining, for each feature of each digital content item, a feature embedding value; and compressing, for each digital content item, the feature embedding values into a compressed embedding representation.GOOGLE-429526. The one or more non-transitory computer-readable media of claim 25, wherein when compressing the feature embedding values the operations further comprise: encoding the feature embedding values; concatenating the encoded feature embedding values; compressing the concatenated encoded feature embedding values; and decoding the compressed concatenated encoded feature embedding values.

27. The one or more non-transitory computer-readable media of any of claims 21 to 26, wherein the operations further comprise providing the selected digital content item to auction.

28. The one or more non-transitory computer-readable media of claim 27, wherein when the selected digital content item wins the auction, the operations further comprise: serving the selected digital content item to one or more user computing devices; and receiving ground truth engagement data associated with the selected digital content item.

29. The one or more non-transitory computer-readable media of any of claims 21 to 28, wherein the at least one previously served digital content item is at a center of the identified at least one cluster of digital content items.

30. The one or more non-transitory computer-readable media of any of claims 21 to 29, wherein the digital content item other than the identified at least one previously served digital content item is within a threshold embedding distance from the center of the cluster.

31. One or more computer program products including instructions that, when executed by one or more processors, cause the one or more processors to perform operations comprising generating, based on feature embeddings of each of a plurality of digital content items, clusters of digital content items, wherein a center of a respective cluster is a previously served digital content item; receiving, from a publisher, a request for a least one digital content item; identifying at least one previously served digital content item of the plurality of digital content items responsive to the request; identifying at least one cluster of digital content items associated with the at least one previously served digital content item; and selecting, from the identified at least one cluster, a digital content item other than the identified at least one previously served digital content item.GOOGLE-429532. The one or more computer program products of claim 31, wherein the previously served digital content item corresponds to a digital content item having a threshold amount of ground truth engagement data.

33. The one or more computer program products of claim 32, wherein the ground truth engagement data comprises at least one of a user selection of the digital content item, a threshold period of time the digital content item was viewed, a click through rate associated with the digital content item, or a conversion rate associated with the digital content item.

34. The one or more computer program products of claim 32 or 33, wherein the digital content item other than the identified at least one previously served digital content item corresponds to a digital content item having less than the threshold amount of ground truth engagement data.

35. The one or more computer program products of any of claims 31 to 33, wherein when generating the feature embeddings, the operations further comprise: identifying one or more features of each of the plurality of digital content items, the one or more features comprising at least one of image, text, video, motion, or audio; determining, for each feature of each digital content item, a feature embedding value; and compressing, for each digital content item, the feature embedding values into a compressed embedding representation.

36. The one or more computer program products of claim 35, wherein when compressing the feature embedding values the operations further comprise: encoding the feature embedding values; concatenating the encoded feature embedding values; compressing the concatenated encoded feature embedding values; and decoding the compressed concatenated encoded feature embedding values.

37. The one or more computer program products of any of claims 31 to 36, wherein the operations further comprise providing the selected digital content item to auction.

38. The one or more computer program products of claim 37, wherein when the selected digital content item wins the auction, the operations further comprise: serving the selected digital content item to one or more user computing devices; and receiving ground truth engagement data associated with the selected digital content item.GOOGLE-429539. The one or more computer program products of any of claims 31 to 38, wherein the at least one previously served digital content item is at a center of the identified at least one cluster of digital content items.

40. The one or more computer program products of any of claims 31 to 39, wherein the digital content item other than the identified at least one previously served digital content item is within a threshold embedding distance from the center of the cluster.