A multi-dimensional user image automatic subdivision and accurate orientation system and method
By integrating multi-source data and using dynamic audience clustering algorithms, we have achieved automated segmentation and precise targeting of user profiles, solving the problems of low user profile update frequency and poor targeting effect in existing advertising systems, and improving the accuracy and efficiency of ad delivery.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- 北京娱广科技有限公司
- Filing Date
- 2025-08-06
- Publication Date
- 2026-06-30
AI Technical Summary
Existing advertising systems lack the ability to respond to changes in users' real-time behavior and interests, resulting in poor targeted advertising performance. This is especially true in new content scenarios such as short videos and live streaming, where user profiles are updated infrequently and data dimensions are limited, making it difficult to achieve precise targeting.
A multi-source data acquisition system is constructed, user profile tags are generated through a multi-label neural network, user feature extraction and audience segmentation are achieved by combining dynamic audience clustering algorithms, multi-dimensional data fusion methods are used for automated segmentation and precise targeted delivery, and a feedback closed-loop mechanism is constructed for model optimization.
It significantly improves the richness of user profiles and the accuracy of targeted advertising, increases click-through rate by more than 25%, supports minute-level updates for user group segmentation, and has the ability to continuously learn and respond in real time.
Smart Images

Figure CN120951008B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of digital marketing and user modeling, and in particular to a precise advertising targeting method that automatically generates user profiles and segments the audience by aggregating multi-source data (including user behavior, social media interaction, live streaming feedback, etc.), belonging to the field of intelligent computer data processing and advertising. Background Technology
[0002] With the continuous growth of the digital advertising market, targeted advertising has become a key means to improve conversion rates and reduce customer acquisition costs. Traditional advertising systems generally rely on preset user tags and rules, lacking the ability to respond to changes in real-time user behavior and interests. In addition, existing systems often build user profiles based on only a single data source (such as click behavior or historical consumption), resulting in limited data dimensions and low profile update frequency, leading to poor targeted advertising performance.
[0003] Especially in new content scenarios such as short videos and live streaming, user interaction behaviors (such as bullet comments, comments, dwell time, and tipping) contain rich information on interests and preferences. If these can be effectively integrated and analyzed, the dynamism and accuracy of the profiling system will be greatly improved.
[0004] Therefore, there is an urgent need for an automated segmentation and precise targeting method for user profiles based on multi-source data fusion, which can realize a user intelligent identification mechanism with real-time updates, automatic clustering, and efficient matching. Summary of the Invention
[0005] The purpose of this invention is to provide a method for automated segmentation and precise targeting of multi-dimensional user profiles. By constructing a multi-source data collection system, a tag generation module, and a dynamic crowd clustering algorithm, it automatically completes user feature extraction and crowd segmentation, thereby driving the advertising system to achieve efficient and precise delivery.
[0006] According to a first aspect of the embodiments of this specification, a multi-dimensional user profile automated segmentation and precise targeting system is provided, characterized in that it includes:
[0007] A multi-source data acquisition module is used to acquire data from user terminals or platform systems, including text data, behavioral data, and image data.
[0008] The feature extraction and label generation module includes:
[0009] The text processing submodule uses a pre-trained language model to perform vector encoding on text data, generating sentence-level semantic vectors.
[0010] The behavior modeling submodule performs multidimensional modeling of continuous user behavior sequences and extracts behavior preference vectors.
[0011] The visual encoding submodule uses a visual neural network model to extract image feature vectors.
[0012] The output feature vectors of the three sub-modules are concatenated into a unified high-dimensional vector, which is then input into a multi-label neural network to generate user profile label vectors.
[0013] The user profile database is used to store the data collected by the multi-source data acquisition module and the user profile tag vectors output by the tag generation module. The user profile database provides structured user profile data and user historical behavior data to subsequent modules.
[0014] The audience clustering and targeting module is used to divide users into multiple strategic audience packages based on the user profile tag vector and the user's advertising return on investment (ROI) performance, and to label the audience attribute tags.
[0015] The ad scheduling module calls the matching audience package according to the ad's set delivery goals, and combines the ad resource position configuration and delivery sorting rules to complete the real-time ad scheduling;
[0016] The feedback loop module is used to collect response data after user exposure, update the user's advertising ROI metric and profile tag vector confidence, and feed the results back to the user profile database to support automatic model retraining and profile optimization.
[0017] Preferably, the feature extraction and label generation module uses the BERT-Base-Chinese model to generate a 768-dimensional text semantic vector, the behavioral data is extracted by a two-layer input multilayer perceptron (MLP), the image data is generated by a visual neural network model to generate visual features, and finally concatenates them into a 1312-dimensional unified representation vector.
[0018] Preferably, the multi-label neural network is a three-layer fully connected structure that outputs 50-dimensional label probabilities, is activated using the Sigmoid function, and is trained using binary cross-entropy (BCE) as the loss function.
[0019] Preferably, in the crowd clustering and targeting module, the user ad ROI weight is used as a distance function factor to construct a weighted Euclidean distance for similarity calculation.
[0020] Preferably, after completing clustering, the crowd clustering and targeting module further includes the steps of assigning strategy labels and generating crowd packages:
[0021] The system analyzes each cluster based on the performance metrics of users in each cluster during historical ad campaigns, including click-through rate, conversion rate, and return on investment (ROI).
[0022] Based on the analysis results, the clusters are assigned strategy labels, including but not limited to "high-conversion new user package", "high-interaction deep fan package" or "potential churned and recalled audience";
[0023] The user clusters that have been assigned strategy tags are structured into strategy audience packages and stored in the user profile database for the ad scheduling module to call to achieve accurate matching of ad objectives.
[0024] Another aspect of the present invention is to provide a method for automated segmentation and precise targeting of multi-dimensional user profiles, characterized by comprising the following steps:
[0025] Collect multimodal data from users, including text data, behavioral data, and image data;
[0026] The text data is processed by a natural language processing model to extract semantic vectors.
[0027] Behavioral modeling is performed on the behavioral data to extract behavioral preference vectors;
[0028] The image data is visually encoded to extract image feature vectors;
[0029] The feature vector obtained by fusing the multimodal data is a unified vector, which is input into a multi-label neural network and outputs multiple user profile labels;
[0030] The clustering step involves weighting user tag vectors based on user advertising ROI performance and using a clustering algorithm combining KMeans++ and DBSCAN to automatically segment strategic audience groups.
[0031] Based on the advertiser's set goals and ad placements, the system calls up the matching audience package and schedules ad delivery.
[0032] Collect user response data after exposure, update user ad ROI and tag confidence, and determine whether to retrain the model based on feedback, forming a closed loop of collection-modeling-deployment-feedback.
[0033] Preferably, in the text data processing step, the BERT-Base-Chinese pre-trained language model is used to encode the cleaned and segmented user text data to extract 768-dimensional sentence-level semantic vectors;
[0034] Preferably, in the step of generating user profile labels through the multi-label neural network, the unified vector is input into a neural network consisting of a three-layer fully connected structure. The output layer of the neural network uses the Sigmoid activation function to output a probability vector containing 50 user profile labels, and the binary cross-entropy (BCE) is used as the loss function during the training phase.
[0035] Preferably, in the clustering step, a user similarity function is dynamically constructed based on user profile tags and ROI performance. The function introduces ROI weights as factors and uses weighted Euclidean distance to calculate the similarity between users.
[0036] Preferably, the strategic audience package generation step includes: analyzing the click-through rate, conversion rate, and ROI of users in each cluster in historical advertising campaigns; assigning strategic tags to each cluster based on the analysis results, including "high-conversion new user package", "high-interaction deep fan package", and "potential churned and recalled audience"; and writing the clusters with strategic tags into the user profile database for subsequent advertising scheduling to achieve advertising target matching.
[0037] The core inventive points of this invention include:
[0038] 1. Construct a "tag generation module" that integrates multi-source data such as user behavior, social interaction, and live streaming feedback to achieve automatic tagging and dynamic updating of user profiles;
[0039] 2. An adaptive crowd clustering algorithm is introduced, which combines real-time profile tags and interest features to achieve automated segmentation and precise targeted matching of user groups.
[0040] Through the above-mentioned, but not limited to, inventive points, the present invention forms a user identification system that is capable of continuous learning, real-time response, and automatic optimization, significantly improving the accuracy and efficiency of ad targeting.
[0041] Compared to existing targeting methods, this invention can increase user profile richness by more than 3 times, covering multiple tags such as interaction dimensions, interest tendencies, and purchase intent. Targeted ad click-through rates are increased by an average of over 25%. It boasts a high degree of automation, with user profiles and audience segmentation supporting minute-level updates; and strong scalability, supporting integration with new platform data sources and more refined targeting objectives. Attached Figure Description
[0042] To more clearly illustrate the technical solutions in the embodiments or related technologies of this specification, the drawings used in the description of the embodiments or related technologies will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0043] Figure 1 This is a system architecture diagram of Embodiment 1 of the present invention;
[0044] Figure 2 This is a schematic diagram of the label generation module structure according to Embodiment 2 of the present invention;
[0045] Figure 3 This is a schematic diagram of the label generation process in Embodiment 2 of the present invention. Detailed Implementation
[0046] The technical solutions in the embodiments of this specification will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0047] It should be noted that the terms "comprising" and "having," and any variations thereof, in the embodiments and drawings of this specification are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the steps or units listed, but may optionally include steps or units not listed, or may optionally include other steps or units inherent to these processes, methods, products, or devices.
[0048] The following examples are combined Figures 1-3 The technical solution of the present invention will be described in detail to demonstrate the system functional modules and their collaborative mechanism.
[0049] Example 1:
[0050] This invention mainly includes the following functional modules: multi-source data acquisition module 10; tag generation module 20; user profile database 30; population clustering and targeting module 40; delivery scheduling module 50; and feedback closed-loop module 60.
[0051] Specifically:
[0052] Multi-source data acquisition module 10: This module is used to collect raw user-related data from multiple channels in real time, serving as the starting point for user profile construction. Its inputs include user behavior logs (such as browsing, clicks, and dwell time) from different platforms, social interaction records (such as bullet comments, comments, and private messages), and tipping behavior and interaction frequency in live streaming scenarios. The system acquires this multimodal data through data acquisition interfaces (API or log monitoring service) and performs preprocessing operations such as unified format encapsulation, user identifier normalization, and timestamp alignment. The processed standardized data is sent to the tag generation module 20 as input for feature extraction and simultaneously stored in the user profile database 30 as historical behavior records to support subsequent model updates and anomaly detection.
[0053] Tag generation module 20: Receives multimodal user data from multi-source data acquisition module 10, mainly including text data (such as comments and bullet screens), user behavior data (such as click sequences and dwell time), and image data (such as video thumbnails or cover images). This module uses a multimodal deep representation learning method to extract and fuse features from different modalities, ultimately generating high-dimensional user tag vectors.
[0054] Specifically, text data is first processed by a pre-trained BERT-Base-Chinese model (a pre-trained language model specifically designed for Chinese, based on Google's BERT (Bidirectional Encoder Representations from Transformers) model, which converts it into fixed-dimensional semantic vectors). Behavioral data is encoded using a sliding time window and then input into a multilayer perceptron (MLP) network to extract user behavioral preference feature vectors. Image data is processed using visual neural networks (such as CLIP or ResNet) to extract visual features. These three types of modal features are then concatenated into a high-dimensional representation vector h after unified encoding, which is then fed into a multi-label neural network model. This network employs a multi-layer fully connected structure, with an output layer using an N-dimensional sigmoid activation function to predict the probability distribution of users across N-dimensional labels. These labels cover interest domains (such as "maternal and infant care" and "gaming") and behavioral states (such as "deep interaction" and "short-term churn"). During training, Binary Cross Entropy (BCE) is used as the loss function, and the model supports periodic updates (e.g., training every 10 minutes) to adapt to dynamic changes in user interests and behaviors.
[0055] The final output of this module is a label vector for each user (such as a probability vector of length N), which, along with the original input vector, is stored in the user profile database 30 to provide input data for the subsequent user clustering and strategy generation module 40. Simultaneously, this module also supports model version management and label confidence scoring mechanisms to ensure a good balance between timeliness and accuracy in the labeling system.
[0056] User Profile Database 30: This module serves as the core storage unit of the system, responsible for managing user tag information, feature vectors, and clustering strategy results. Its input primarily comes from the multi-tag features of users output by the tag generation module 20, and the strategy audience packages (such as high-conversion users and high-interaction groups) generated by the audience clustering and targeting module 40. The database employs a hybrid management approach of vectorization and tag indexing, allowing for rapid querying of each user's historical version features and interest tags. Other modules in the system can access this database: the tag generation module uses its historical tags for model training, the audience clustering module uses user vectors for dynamic clustering, the delivery scheduling module filters audiences based on strategy tags, and the feedback loop module uses it to correct advertising ROI (Return on Investment) and mark versions. The database's output structure supports the system's requirements for real-time performance and interpretability.
[0057] Audience Clustering and Targeting Module 40: This module receives user feature vectors and tag information from the user profile database 30, and combines them with the advertising ROI weight index for each user provided by the feedback loop module 60 to automatically cluster user groups and construct strategic audience packages. The core of this module lies in introducing ROI weight as a moderating factor for individual influence, dynamically adjusting the attractiveness of each user to the cluster center in the distance function, thereby improving the discriminative power and commercial value of the clustering effect.
[0058] Specifically, suppose the original feature vector of user i is x i Its ROI weight is w i The weighted distance between users is defined as follows:
[0059]
[0060] This function reflects the significant influence of high-ROI users in the clustering process. The module prioritizes the K-Means++ algorithm for cluster center initialization and dynamically adjusts the number of clusters K based on metrics such as cluster profile coefficient and intra-cluster variance to accommodate user groups of different sizes and structures. Furthermore, for cold-start user groups with low activity and sparse behavioral data, the module uses the DBSCAN algorithm (Density-Based Spatial Clustering of Applications with Noise, an unsupervised clustering algorithm suitable for discovering cluster structures of arbitrary shapes and automatically identifying outliers or anomalous data) for density clustering and anomaly boundary detection to prevent these users from interfering with the overall cluster structure.
[0061] After clustering, the system analyzes each cluster and its corresponding user set, assigning strategy tags based on the group's performance in historical advertising campaigns (such as click-through rate, conversion rate, ROI, etc.), such as "high-conversion new user package," "high-interaction deep fan package," and "potential churned and recalled audience." Optionally, each audience cluster with strategy tags will be written as a structured strategy audience package into the user profile database 30 for subsequent use by the campaign scheduling module 50, achieving precise matching between advertising goals and audiences.
[0062] The ad delivery scheduling module 50: This module matches the targeting conditions set by the advertiser with the structured audience packages in the user profile database in real time and completes the specific ad resource scheduling. Its inputs include targeting parameters provided by the advertising platform (such as interest areas and behavioral tags), the set of users currently meeting the conditions in the user profile database, and ad placement information. The module evaluates the exposure priority of the candidate group through a ranking mechanism and, combined with logical rules such as display frequency control and cooldown time scheduling, generates the final delivery decision. The output ad scheduling results are submitted to the advertising system in real time for display. Simultaneously, this module records user response behaviors (clicks, conversions, etc.) and feeds them back to the feedback loop module 60 for subsequent performance evaluation and model updates. The ad delivery scheduling module plays a crucial central role in the system's ad execution.
[0063] Feedback Loop Module 60: This module is responsible for incorporating post-advertising performance data into the system, forming a feedback path for profile optimization and model adjustment. Its input comes from user response data collected by the campaign scheduling module 50, including core metrics such as impressions, clicks, conversions, and dwell time. The module first aggregates and analyzes user-level behavioral data, updating its ROI performance and behavioral weights, and then feeds the results back to the user profile database 30 to correct feature records. Furthermore, the system dynamically evaluates the performance of the current tag generation model and clustering model based on the feedback results. If a decline in conversion rates for a certain user group or an increase in tag prediction bias is detected, a mechanism for automatic model parameter fine-tuning or retraining is automatically triggered. This module ensures the system has self-learning and iterative evolution capabilities, constructing a complete intelligent closed-loop process from data collection to campaign optimization.
[0064] To further illustrate the technical solution of this invention, a complete embodiment is provided in conjunction with a practical application scenario. This embodiment is deployed on a medium-sized video content and e-commerce integration platform with approximately 2 million daily active users. The platform covers multiple business modules such as short videos, live streaming, and product recommendations. The user data is diverse and the behavioral chain is complete, making it suitable for the end-to-end implementation of the multi-dimensional user profile automated segmentation and precise targeting method proposed in this invention.
[0065] Step 1: First, use the multi-source data acquisition module 10 to receive data. Specifically, the platform connects to three data sources in real time:
[0066] Approximately 3.5 million user behavior samples were collected, including behavioral log data (clicks, views, dwell time, add-to-cart, etc.).
[0067] A total of 1.2 million pieces of social interaction data (comments, bullet comments, private message keywords, etc.);
[0068] The live stream interaction and feedback data (likes, tips, frequency of live stream interactions, etc.) involves approximately 600,000 users.
[0069] All data undergoes preprocessing, including user ID normalization, timestamp alignment, field standardization, and deduplication filtering, and is structured and encapsulated in JSON format. It is then pushed to the tag generation module 20 in real time and simultaneously stored asynchronously in the user profile database 30 as a basic profile record.
[0070] Step 2: The tag generation module 20 receives the above three types of modal data and uses a multimodal deep learning model to generate high-dimensional user tags. Specifically:
[0071] Text data is processed by the BERT-Base-Chinese model architecture to output a 768-dimensional semantic vector.
[0072] Step 3: The user profile database 30 stores the user tags and vectors output by the tag generation module 20, the original behavior records, and the policy tags in a unified manner. The database has the following functions:
[0073] User-level version management: Historical versions are retained with each tag update;
[0074] Supports multiple indexing mechanisms: tag reverse lookup, audience recall, and vector nearest neighbor search;
[0075] Provide a unified data interface for subsequent modules (crowd clustering, deployment scheduling);
[0076] It supports data synchronization across multiple platforms, such as the main website app, live streaming platforms, and e-commerce platforms.
[0077] Step 4: The audience clustering and targeting module 40 automatically segments the target audience based on user feature vectors and label results. The specific implementation is as follows:
[0078] For each user, an ROI value is calculated based on ad conversion data from the past 7 days. This ROI value is used as a feature weight wi∈[0.8, 2.0] and applied to the weighted Euclidean distance function.
[0079]
[0080] The main clustering process is performed using the K-Means++ algorithm (which is an improved version of the K-Means clustering algorithm, and its main goal is to optimize the selection of initial cluster centers), with the dynamic K value controlled between 30 and 80.
[0081] For users with low activity or sparse behavior, DBSCAN (eps=0.45, minPts=10) is used to identify anomalous boundary groups. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm used to group data points into high-density regions and identify points in low-density regions as noise (or outliers).
[0082] The clustering results are combined with metrics such as historical click-through rate and conversion rate to assign strategic labels to each cluster (e.g., labels such as "high-conversion new user package", "mother and baby intent users", "deeply immersed game users", "highly active white-collar users in the evening").
[0083] The module supports daily clustering and minute-level recall, storing structured audience profiles in a user profile database for subsequent scheduling and retrieval. Compared to the original manual segmentation strategy, this approach resulted in an average click-through rate increase of 27.6%, an ROI increase of 16%, and a CPA reduction of approximately 20%.
[0084] Step 5: The delivery scheduling module 50, after the advertiser sets the delivery target, retrieves the generated structured audience packages from the user profile database and performs real-time scheduling in conjunction with ad placements and system sorting rules. Specifically, the system first matches suitable audience packages based on the targeting conditions set by the advertiser (such as interest tags, behavioral characteristics, region, and active time); then, it filters the candidate user set based on strategy tags and performs dynamic recall by combining information such as the user's historical ad performance and activity level. During the scheduling process, the system also comprehensively considers the availability of current ad placements, frequency control strategies (such as cooldown time and exposure limits), and advertiser priority to sort and match the audience and ad content, and issues delivery instructions within milliseconds. The final delivery results (such as impressions, clicks, dwell time, and conversions) are synchronized to the feedback loop module through log recording to support profile updates and delivery strategy optimization.
[0085] Step 6: The feedback loop module 60 records user response data after ad delivery, including key metrics such as click behavior, dwell time, and conversion results after exposure, and generates ROI performance and behavioral trend assessments at the individual user level based on this data. The system dynamically updates the user's ROI weight and tag confidence based on this feedback, writing the updated data into the user profile database to ensure the timeliness and accuracy of the profile. Simultaneously, this module also possesses model self-evaluation capabilities, able to determine whether to trigger retraining or parameter fine-tuning of the tag generation model or clustering model based on feedback data, thereby achieving continuous learning and optimization of the system. Working collaboratively with modules such as data collection, profile modeling, and audience targeting, the feedback loop module constructs a closed-loop mechanism of "collection – modeling – delivery – feedback – re-modeling," significantly improving the system's stability and responsiveness.
[0086]
[0087] Table 1. Comparison of the technical effects of the present invention and the prior art
[0088] Example 2 is a further refinement of Example 1. The parts that are the same as in Example 1 will be omitted, and only the parts that are different from those in Example 1 will be recorded.
[0089] like Figure 2 As shown, in this embodiment, the label generation module 20 consists of the following three sub-modules:
[0090] The text processing submodule 201 uses a pre-trained language model (such as BERT-Base-Chinese) to segment and encode the cleaned text input, and outputs sentence-level semantic vectors to represent the user's interest in the language content.
[0091] The behavior modeling submodule 202 takes into account the user's behavior sequence over 7 consecutive days, including features such as dwell time, clicks, favorites, and add-to-cart, and extracts the user behavior preference vector through a multi-layer perceptron network.
[0092] The visual encoding submodule 203 inputs the video / image covers viewed by the user into the CLIP or ResNet model to generate feature vectors representing visual preferences.
[0093] The outputs of the three modalities are concatenated into a unified high-dimensional feature vector h, which is then fed into a multi-label neural network to output the user's label prediction results in dimensions such as interests, behavioral status, and purchase intent.
[0094] like Figure 3 As shown, for step 2 in Example 1, the process of generating high-dimensional user tags and feature vectors is described:
[0095] Collect user text, behavioral, and visual (image) data. Specifically, the raw text data comes from comments, bullet comments, private messages, etc., and is usually one or a few sentences.
[0096] First, the text is cleaned (removing special symbols, duplicate characters, etc.), and then a tokenizer is used to segment the Chinese sentence into words or subword units.
[0097] The word segmentation results will be converted into the corresponding Token ID sequence (a list of integers), for example: "Baby shampoo works very well" → [101, 741, 8227, 3719, 2300, 4500, 102] (the beginning and end are [CLS] and [SEP] respectively);
[0098] Simultaneously, an attention mask is generated to identify which positions are valid tokens and Token TypeIDs (used to distinguish sentence pairs; single sentences are all 0s by default).
[0099] Model input stage: The Token ID, Attention Mask, and other inputs from the previous step are fed into the BERT-Base-Chinese model. This model is a deep neural network consisting of 12 Transformer encoder layers. Each layer contains a multi-head self-attention mechanism and a feedforward network, which can capture contextual semantic dependencies. Each Token has a 768-dimensional vector representation in each layer.
[0100] Semantic vector generation methods (two methods are available):
[0101] Method 1: Extract the [CLS] vector: The BERT model uses the output vector of the first [CLS] token at a specific position as the semantic representation of the entire sentence;
[0102] Method 2: Pool all token outputs: Perform mean pooling or max pooling on the output vectors of all tokens to obtain the overall sentence vector;
[0103] In this embodiment, the [CLS] vector is preferably used as the sentence-level semantic representation, with a dimension of 768.
[0104] Output format: Each text input (such as a comment) is ultimately encoded into a fixed-dimensional vector as the semantic feature of the text; this semantic vector is then concatenated with other modalities (behavioral vectors, image vectors) to form a unified user feature vector.
[0105] In the above embodiments, Token ID: maps the words after text segmentation to the numbers in the model training vocabulary;
[0106] [CLS] Vector: Output vector representing a special position in the meaning of the entire sentence;
[0107] 768-dimensional vector: The hidden state dimension of each layer in the BERT-Base model, that is, the representation vector dimension of each token.
[0108] Behavioral data is used to construct a continuous 7-day behavioral sequence (8-dimensional features such as dwell time, clicks, and add-to-cart), which is then input into a two-layer MLP network (64→32).
[0109] Image / video thumbnails are generated using the CLIP model (ViT-B / 32) to produce 512-dimensional visual vectors;
[0110] The three types of features are concatenated into a unified vector h (1312 dimensions), which is then input into a three-layer fully connected network (1312→512→128→50) to output probability predictions for 50 labels, covering multiple dimensions such as interest tendency, behavioral state, and intention prediction.
[0111] The network employs Binary Cross Entropy (BCE) as the loss function to predict the probability of each user's 50-dimensional label. During training, the system uses 2 million user samples as a foundation, constructing a fused vector composed of text, behavioral, and image features as input. Training is performed using the AdamW optimizer, with approximately 10 training cycles, and is deployed on a 4-GPU environment. During inference, the model updates labels for new user behaviors at a minute-by-minute frequency, refreshing the user label vector every 10 minutes to meet the accuracy and timeliness requirements of high-frequency dynamic user profiling.
[0112] The network described above uses binary cross-entropy as the loss function, primarily to meet the technical requirements of the multi-label prediction task in this invention. Among the multiple interest and behavior labels corresponding to each user, the labels are not mutually exclusive; therefore, the traditional Softmax + cross-entropy structure cannot be used. Instead, a sigmoid activation function that models each label independently is required, and binary cross-entropy is used to measure the difference between the predicted probability and the true value of each label. This loss function can improve the stability and generalization ability of label prediction while maintaining the model's expressive power.
[0113] Specifically, including:
[0114] 1. Training Data Preparation
[0115] Each training sample is a fusion feature vector h∈R for one user. 1312 It is composed of three types of features: text, behavior, and image.
[0116] The label part is a multi-label vector of length N (e.g., 50 dimensions), where each dimension is 0 or 1, indicating whether the user possesses a certain feature.
[0117] For example: label vector: [0, 1, 0, 0, 1, ...]; whether it belongs to "maternal and infant interest", "high game activity", "short-term churn", etc.
[0118] 2. Network Structure
[0119] Input layer: User feature vectors with dimension 1312;
[0120] Hidden layer: Two fully connected layers (e.g., 1312→512→128);
[0121] Output layer: Sigmoid activation function, outputting N floating-point numbers ∈ (0,1), each number representing the predicted probability of a certain label;
[0122] Use regularization techniques such as Dropout and BatchNorm to prevent overfitting.
[0123] 3. Binary Cross Entropy (BCE) function
[0124] For each label j∈{1,2,...,N}, the true value is y. j ∈{0,1}, the predicted value is The formula for BCE is:
[0125] ;
[0126] The total loss is the weighted average of the N labels.
[0127] 4. Training Configuration
[0128] Data scale: 2 million user samples;
[0129] Optimizer: AdamW;
[0130] Initial learning rate: 1 e -4 ;
[0131] Batch size: 512;
[0132] Number of training epochs: 10~20 epochs;
[0133] Hardware: 4 A100 GPUs per machine, supporting multi-GPU parallel processing;
[0134] Each training session lasts approximately 60-90 minutes.
[0135] The effectiveness was verified using metrics such as Early Stopping and AUC / mAP.
[0136] The deployment and online inference process (for the "online phase") includes the following steps:
[0137] 1. Model Deployment
[0138] After training, the model weights are deployed to the online service system; the online service is encapsulated as an API interface for the real-time user profile generation module to call; the inference service supports high concurrency and low latency (approximately 20-40 milliseconds per inference); the inference engine uses frameworks such as ONNX and TensorRT to optimize inference speed.
[0139] 2. Inference data update frequency
[0140] New behavioral data is received every 10 minutes from the user profile database or data flow system (such as Kafka); a new round of user feature generation and model inference is triggered; the inference results (i.e. the user's latest tag probability vector) are written back to the user profile database; more than 100 iterations can be performed every day to meet the needs of minute-level dynamic tag refresh.
[0141] After adopting the above technical solution, the average number of tags per user increased from 2.1 to 6.4, and the tag richness increased by about 3 times; the coverage of interest tags increased to 92%; the module supports minute-level incremental inference, which significantly enhances the dynamism of user profiles.
[0142] Those skilled in the art will understand that the accompanying drawings are merely schematic diagrams of one embodiment, and the modules or processes shown in the drawings are not necessarily essential for implementing the present invention.
[0143] Those skilled in the art will understand that the modules in the apparatus of the embodiments can be distributed in the apparatus of the embodiments as described in the embodiments, or they can be located in one or more devices different from this embodiment with corresponding changes. The modules of the above embodiments can be combined into one module, or they can be further divided into multiple sub-modules.
[0144] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, and not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features; and these modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims
1. A multi-dimensional user profiling automated segmentation and precise targeting system, characterized in that, include: A multi-source data acquisition module (10) is used to acquire data from a user terminal or platform system, the data including text data, behavioral data, and image data; The feature extraction and label generation module (20) includes: The text processing submodule (201) uses a pre-trained language model to perform vector encoding on the text data and generate sentence-level semantic vectors; The behavior modeling submodule (202) performs multidimensional modeling of continuous user behavior sequences and extracts behavior preference vectors; The visual coding submodule (203) extracts image feature vectors using a visual neural network model; The output feature vectors of the three sub-modules are concatenated into a unified high-dimensional vector, which is then input into a multi-label neural network to generate user profile label vectors. The user profile database (30) is used to store the data collected by the multi-source data acquisition module (10) and the user profile tag vector output by the tag generation module (20). The user profile database (30) provides structured user profile data and user historical behavior data to subsequent modules. The audience clustering and targeting module (40) is used to divide users into multiple strategy audience packages based on the user profile tag vector and the user advertising return on investment (ROI) performance, using a clustering algorithm combining KMeans++ and DBSCAN, and to label the audience attribute tags; wherein, in the audience clustering and targeting module (40), the user advertising ROI weight is used as a distance function factor to construct a weighted Euclidean distance for similarity calculation; The advertising scheduling module (50) calls the matching audience package according to the advertising target set by the advertiser, and completes real-time advertising scheduling by combining the advertising resource position configuration and the advertising sorting rules. The feedback loop module (60) is used to collect response data after user exposure, update the user advertising ROI index and profile tag vector confidence, and feed the results back to the user profile database (30) to support the automatic retraining of the model and profile optimization.
2. The system according to claim 1, wherein the feature extraction and label generation module (20) generates a 768-dimensional text semantic vector using the BERT-Base-Chinese model, the behavioral data is extracted by a two-layer input multilayer perceptron (MLP), the image data is generated by a visual neural network model to generate visual features, and finally concatenates them into a 1312-dimensional unified representation vector.
3. The system according to claim 1, wherein the multi-label neural network is a three-layer fully connected structure, outputs 50-dimensional label probabilities, is activated using the Sigmoid function, and is trained using binary cross-entropy (BCE) as the loss function.
4. The multi-dimensional user profile automated segmentation and precise targeting system according to claim 1, characterized in that: After completing clustering, the crowd clustering and targeting module (40) further includes the steps of assigning strategy labels and generating crowd packages: The system analyzes each cluster based on the performance metrics of users in each cluster during historical advertising campaigns, including click-through rate, conversion rate, and return on investment (ROI). Based on the analysis results, the clusters are assigned strategy labels, including but not limited to "high-conversion new user package", "high-interaction deep fan package" or "potential churned and recalled audience"; The user clusters that have been assigned strategy tags are structured into strategy audience packages and stored in the user profile database (30) for the advertising scheduling module (50) to call in order to achieve accurate matching of advertising goals.
5. A method for automated segmentation and precise targeting of multi-dimensional user profiles, characterized in that, Includes the following steps: Collect multimodal data from users, including text data, behavioral data, and image data; The text data is processed by a natural language processing model to extract semantic vectors. Behavioral modeling is performed on the behavioral data to extract behavioral preference vectors; The image data is visually encoded to extract image feature vectors; The feature vector obtained by fusing the multimodal data is a unified vector, which is input into a multi-label neural network and outputs multiple user profile labels; In the clustering step, a user similarity function is dynamically constructed based on user profile tags and ROI performance. The function incorporates ROI weights as factors and uses weighted Euclidean distance to calculate the similarity between users. A clustering algorithm combining KMeans++ and DBSCAN is used to automatically segment strategic user groups. Based on the advertiser's set goals and ad placements, the system calls up the matching audience package and schedules ad delivery. Collect user response data after exposure, update user ad ROI and tag confidence, and determine whether to retrain the model based on feedback, forming a closed loop of collection-modeling-deployment-feedback; In the text data processing step, the BERT-Base-Chinese pre-trained language model is used to encode the cleaned and segmented user text data and extract 768-dimensional sentence-level semantic vectors. In the step of generating user profile labels through the multi-label neural network, the unified vector is input into the neural network consisting of a three-layer fully connected structure. The output layer of the neural network adopts the Sigmoid activation function and outputs a probability vector containing 50 user profile labels. During the training phase, the binary cross-entropy (BCE) loss function is used.
6. The method according to claim 5, characterized in that: The strategic audience generation steps include: Analyze the click-through rate, conversion rate, and ROI of users in each cluster during historical ad campaigns; based on the analysis results, assign strategy tags to each cluster, including "high-conversion new user package", "high-interaction deep fan package", and "potential churned and recalled audience"; write the clusters with strategy tags into the user profile database for subsequent ad scheduling to achieve ad target matching.