A marketing user portrait dynamic updating method based on multi-source data fusion
By performing time-series correction and feature classification on multi-source marketing data, isolating short-term conflicting features, and adopting an incremental update method, the problems of asynchronous arrival and out-of-series order of multi-source data are solved, improving the accuracy of marketing user profiles and system efficiency, and ensuring the reliability of marketing strategies and business continuity.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANGHAI WANGMAI INFORMATION TECH GRP CO LTD
- Filing Date
- 2026-05-22
- Publication Date
- 2026-06-19
AI Technical Summary
In scenarios with high concurrency access to multi-source heterogeneous marketing data, existing technologies suffer from inaccurate data updates due to asynchronous data arrival and out-of-order timing issues. Furthermore, short-term, high-frequency data fluctuations can negatively impact long-term stable features, leading to feature semantic conflicts and profile state drift, thus increasing system load and latency.
By receiving marketing event data from multiple sources, performing time-series correction and feature classification, establishing a feature time-sensitivity mapping relationship, isolating short-term conflicting features, and adopting an incremental update mechanism, the system performs single-hop propagation control only on directly related profile dimensions to reduce erroneous updates and recalculations.
It improves the accuracy of marketing user profile updates and system processing efficiency, reduces the erroneous updates of long-term features due to short-term noise, reduces node scheduling pressure and resource consumption, and enhances the reliability of marketing strategies and business continuity.
Smart Images

Figure CN122240935A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of marketing data processing technology, specifically to a method for dynamically updating marketing user profiles based on multi-source data fusion. Background Technology
[0002] With the continuous digitalization of internet marketing, membership operations, and advertising, enterprises typically need to aggregate and process data generated from web pages, mobile devices, offline stores, customer relationship management systems, and advertising platforms to create user profiles and support precision marketing. For example, the published invention patent application CN118014622B discloses an advertising push method and system based on user profiles, which mainly collects data from multiple information sources, generates user tags, and implements advertising push based on user tags. Another example is the published invention patent application CN113934612B, which discloses a user profile update method, device, storage medium, and electronic device, which mainly completes the user profile update by combining current behavioral data, historical profile tags, and tag logical relationships. The aforementioned technologies can improve the efficiency of user profile construction and updating to some extent, but they mainly focus on multi-source data integration, tag generation, or tag validity screening, and have not fully addressed the issue of dynamic update accuracy of multi-source heterogeneous marketing data in high-concurrency access scenarios. Specifically, different data sources generally have different link latency during collection, transmission, and access, which can easily lead to asynchronous data arrival and out-of-order timing. At the same time, different profile features also have significant timeliness differences in business applications. If existing technologies uniformly adopt overwrite or decay methods to update different types of features, it is easy for short-term high-frequency fluctuations in data to have an undue impact on long-term stable features, thereby leading to… Issues such as feature semantic conflicts and profile state drift exist. Furthermore, after profile state drift occurs, existing systems typically require large-scale recalculation of associated features for correction, which can easily lead to increased node scheduling pressure, frequent disk read / write operations, increased memory resource consumption, and increased update latency in high-concurrency scenarios. Therefore, it is necessary to propose a dynamic update method for marketing user profiles based on multi-source data fusion. This method performs time-series correction on out-of-order data and adopts a differentiated update mechanism for features with different lifecycles to reduce the erroneous updating of long-term features, reduce the resulting global recalculation overhead, and thus improve the accuracy of profile updates and system processing efficiency. Summary of the Invention
[0003] To address the shortcomings of existing technologies, this invention provides a method for dynamically updating marketing user profiles based on multi-source data fusion. This method solves the problem that traditional methods often suffer from differences in link latency during the collection, transmission, and access of different data sources, which can easily lead to asynchronous data arrival and out-of-order data.
[0004] To achieve the above objectives, the present invention provides the following technical solution: A method for dynamically updating marketing user profiles based on multi-source data fusion, comprising: S1: Receive multi-source marketing event data, extract user identifiers, event occurrence time, data reception time, and feature identifiers, and generate a sequence of events to be processed; S2: Perform timing correction on the sequence of events to be processed based on the event occurrence time and data reception time to determine the event processing order; S3: Classify the features in the event processing order according to the timeliness attribute corresponding to the feature identifier. When classifying features, pre-establish the feature timeliness mapping relationship and configure the attenuation parameter. Determine long-time features, short-time features and features to be confirmed based on historical sample rules. S4: Perform conflict verification between short-term features and user historical profile features, isolate conflicting features, and review them in conjunction with similar feature update information from other data sources within a preset observation window; S5: Determine the incremental update content based on the review conclusion, and perform incremental updates on the user profile nodes; S6: Determine the scope of the update for the user profile nodes that have completed the incremental update, and execute subsequent update control according to the determined scope.
[0005] Preferably, S1 includes: When generating a sequence of events to be processed, marketing events from multiple sources are collected and received according to the access time window. A unified user primary key mapping is completed based on account identifier, device identifier, member identifier, or transaction identifier. Fields such as event occurrence time, data reception time, source system identifier, feature identifier, feature value, event weight, transmission integrity identifier, and event trust level are standardized to form a unified event record. For events with missing user primary keys, missing event times, unidentified features, and duplicate events, the processes of completion, padding, temporary storage, and deduplication are performed respectively. Then, the events are aggregated according to the unified user primary key to form a sequence of events to be processed.
[0006] Preferably, S2 includes: When determining the order of event processing, timing correction is initiated according to preset backlog conditions or head waiting conditions; Transmission delay statistics are performed separately for each source system. When there are insufficient valid samples, the dynamic waiting threshold is determined by the current sample, historical statistical results or the default waiting threshold. Then, the data is rearranged according to the event occurrence time, data reception time, event trust level and event number. Identify the source time deviation of time anomalies and correct them according to the median transmission delay or write them into the anomaly time queue; Events that have not arrived even after exceeding the dynamic waiting threshold are output in segments according to a sliding time window and a compensation update record is generated.
[0007] Preferably, the process of identifying the source time deviation of time-related anomalies and correcting them according to the median transmission delay or writing them into an anomaly time queue includes: When handling time-related abnormal events, similar abnormalities are statistically analyzed by source system, and the time deviation status is identified based on the occurrence of abnormalities within a preset period. For source systems that meet the judgment criteria, the event occurrence time is corrected based on the median transmission delay of the most recent normal event; For abnormal events that do not meet the judgment criteria, they are written into the abnormal time queue, do not participate in the current round of processing, and continue to be judged in subsequent time sequence correction. Events that are consistently flagged as abnormal after multiple rounds of verification will be moved to offline verification processing.
[0008] Preferably, S3 includes: For the characteristics of missing mapping relationships, the event weight is calculated and initially classified by combining the event credibility level, behavior duration, business confirmation status, cross-source corroboration status, number of sources, frequency of occurrence, and session distribution. For multiple records with the same characteristic, the best one is selected based on the degree of credibility, the status of the business loop, and the source support. Features to be confirmed are written into the observation pool for further classification, and are included in the corresponding feature set when the transfer conditions are met.
[0009] Preferably, for multiple records with the same characteristic, the best record is selected based on its credibility, business loop status, and source support, including: For multiple candidate records with the same feature in the current event processing order, first compare the event credibility levels; When the event trust levels are the same, then compare the business closed-loop status; When the business loop status is the same, then compare the number of source systems; The timeliness attribute determination record of the feature is determined based on the comparison order.
[0010] Preferably, S4 includes: When handling conflicts, short-term features are merged according to the corresponding relationship of the portrait dimensions and formed a normalized vector with the corresponding historical portrait baseline. Conflict is determined based on the distribution differences. For features identified as conflicting, write them into isolated record units set according to user and profile dimensions, and set the observation window based on the normalized decay rate parameter; During the observation period, retrieve same-dimensional and same-direction features from other sources, review and determine them according to the number of independent sources, same-direction support, and aggregation distance, and update, continue observation, or isolate and remove them according to the determination status.
[0011] Preferably, features identified as conflicting are written into isolated record units set according to user and profile dimensions, and observation windows are set according to the normalized decay rate parameter, including: For the characteristics determined by conflict, isolation record units are set up according to user identifier and profile dimensions, and the conflict feature value, source system identifier, distribution difference value, entry time, normalized decay rate parameter, event credibility level, business confirmation status, isolation weight and review status are registered. The observation window is set according to the range of the normalized decay rate parameter, and the observation period is extended for records with high event confidence level and business confirmation.
[0012] Preferably, S5 includes: When determining the incremental update content, read the user profile baseline snapshot according to the profile dimension, and perform stable fusion, low weight compensation and migration update on long-term features, short-term features that have not entered the conflict isolation process and conflict features that have passed the review, and generate difference residuals, vector residuals or state transition information according to the dimension type. Based on the event trust level, review intensity, and business confirmation status, the update weight is determined and limited before partial write is performed. When the snapshot version changes, the update is recalculated. When the node is unavailable or write is abnormal, the update is temporarily stored or delayed. The updated user profile node snapshot, local residual value, and residual modulus are recorded.
[0013] Preferably, S6 includes: For user profile nodes that have completed incremental updates, the node association relationship and association weight are determined based on business impact rules and historical linkage statistics. The residual modulus is standardized according to dimension type, and the propagation range is determined in combination with event credibility level and system load status. For updates that meet the propagation conditions, single-hop propagation is performed according to the association weight and attenuation coefficient, and update control is performed according to the type of the associated target. Updates that do not meet the conditions for propagation, are in a state of light propagation, or have failed to propagate are handled by either local retention, delayed propagation, retrying, or manual inspection, respectively.
[0014] Compared with existing technologies, this invention provides a method for dynamically updating marketing user profiles based on multi-source data fusion, which has the following beneficial effects: 1. This invention corrects the out-of-order arrival of multi-source marketing events by performing time-series correction, classifies different lifecycle features into long-term, short-term, and pending-confirmation categories, isolates and reviews short-term abnormal fluctuation features that conflict with the historical profile baseline, and adopts a local incremental update method based on the review conclusion. Simultaneously, controlled propagation is performed only on related targets with direct business impact. This reduces the erroneous updates of long-term stable features caused by short-term high-frequency noise, reduces the large-scale recalculation of related features caused by feature semantic conflicts and profile state drift, alleviates node scheduling pressure and disk read / write and memory resource consumption, reduces update latency in high-concurrency scenarios, and improves the accuracy, stability, and efficiency of profile updates.
[0015] 2. This invention, by first writing short-term conflict features into an isolated area and then verifying them in conjunction with features of the same dimension and direction from other data sources within an observation window, and then performing single-hop propagation control only on directly related profile dimensions, marketing tags, or control items based on the verification results and the scope of association, can avoid the cascading impact of temporary promotional clicks, accidental browsing, or single-source anomalies on recommendation tags, coupon package priorities, and recall strategies. It reduces large-scale erroneous adjustments and subsequent strategy rollbacks caused by local anomalies, enhances the matching between profile baselines and marketing actions, and improves the controllability of multi-source profile update processes, the reliability of marketing strategy distribution, and the continuity of business operations. Attached Figure Description
[0016] Figure 1 This is a schematic diagram of the process of a marketing user profile dynamic update method based on multi-source data fusion according to the present invention; Figure 2 This is a schematic diagram illustrating the multi-source marketing event access and pending event sequence generation of the present invention; Figure 3 This is a schematic diagram of the multi-source event timing correction of the present invention; Figure 4 This is a schematic diagram illustrating the time-sensitive classification and diversion processing of the features of this invention; Figure 5 This is a schematic diagram of the short-term feature conflict isolation and multi-source verification of the present invention; Figure 6 This is a schematic diagram illustrating the incremental update and associated diffusion control of user profile nodes in this invention. Detailed Implementation
[0017] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.
[0018] Example 1: Figures 1-6 A method for dynamically updating marketing user profiles based on multi-source data fusion is presented, including: S1: Receive multi-source marketing event data, extract user identifiers, event occurrence time, data reception time, and feature identifiers, and generate a sequence of events to be processed; S2: Perform timing correction on the sequence of events to be processed based on the event occurrence time and data reception time to determine the event processing order; S3: Classify the features in the event processing order according to the timeliness attribute corresponding to the feature identifier. When classifying features, pre-establish the feature timeliness mapping relationship and configure the attenuation parameter. Determine long-time features, short-time features and features to be confirmed based on historical sample rules. S4: Perform conflict verification between short-term features and user historical profile features, isolate conflicting features, and review them in conjunction with similar feature update information from other data sources within a preset observation window; S5: Determine the incremental update content based on the review conclusion, and perform incremental updates on the user profile nodes; S6: Determine the scope of the update for the user profile nodes that have completed the incremental update, and execute subsequent update control according to the determined scope.
[0019] This method is based on a scenario of dynamic user profile updates driven by multi-source marketing data. On the enterprise side, a web-based event tracking system, a mobile application event tracking system, an advertising platform, a customer relationship management platform, a membership transaction platform, a customer service ticketing platform, and an offline store transaction system are run simultaneously. Each system continuously generates behavioral events, transaction events, reach events, and feedback events related to the same user. The processing involves at least the marketing data generator, data receiving device, profile processing device, and profile storage device. Input information includes at least user identifier, event type, event occurrence time, data reception time, source system identifier, feature identifier, business load, and event credibility level. The processing objective is to avoid short-term noise directly covering long-term stable features under conditions of high-concurrency reception, inconsistent data transmission times, and significant differences in feature timeliness, and to reduce large-scale recalculations caused by erroneous updates. Output results may include updated user profile nodes, incremental update records, and associated update records after controlled propagation.
[0020] Specifically, such as Figure 2As shown: Marketing event data from web page tracking systems, mobile application tracking systems, advertising platforms, customer relationship management platforms, membership transaction platforms, coupon platforms, customer service ticketing platforms, and offline store transaction systems are uniformly received; data reception and processing are initiated when a new event is written to any source system; when the same user generates multiple events consecutively within a short period of time, these multiple events are aggregated; an access time window is used during aggregation, with a value ranging from 1 to 5 seconds, and 2 seconds can be used in scenarios with high real-time requirements; the value is based on the marketing interaction. Clicks, impressions, views, coupon redemption, and adding to cart typically occur within 1 to 3 seconds. If the time window is less than 1 second, the same round of continuous interaction can easily be split into multiple receptions, leading to duplicate profile creation and mapping. If the time window is greater than 5 seconds, it will significantly increase the waiting time and delay subsequent profile updates. Therefore, setting the access time window to 1 to 5 seconds is more in line with the actual rhythm of continuous interactions in marketing operations, with 2 seconds balancing the collection of interactions in the same round and the real-time reception. This time window is only used for the collection of the same user event and is not used as an upper limit for the subsequent profile update time. After receiving the event, identity normalization is performed first, mapping the account number, mobile phone number, device number, member number, and open platform identity identifier to the unified user primary key. The unified user primary key adopts a structure combining a primary identity identifier and a source-side auxiliary identifier. The primary identity identifier is used for user normalization, and the auxiliary identifier is used to retain the correspondence of the source system. When the event only contains a transaction number, it is first associated with the order account identifier, member identifier, or payment account identifier based on the transaction number, and then mapped to the unified user primary key. When the event only contains a device identifier and no real-name identity information, the device identifier is written to a temporary identity mapping table, and within the verification period of the most recent 24 hours to 7 days, it is verified based on the most recent valid login relationship and payment gateway. The unified user primary key is determined based on the relationship between the user account or member binding. The verification period is set at 24 hours to 7 days because login behavior, payment behavior, and member binding behavior in mobile marketing scenarios usually form a stable correspondence within a daily range. If the period is shorter than 24 hours, it is easy to miss cross-scenario login or delayed payment records. If the period is longer than 7 days, it is easy to introduce expired historical identity relationships. If the same device identifier corresponds to only one account identifier within the verification period, the mapping is completed directly. If it corresponds to multiple account identifiers, the account identifier that has completed the payment or member binding most recently is used first. If it is still impossible to determine, the temporary identity status is maintained, and the corresponding event is written to the pending completion queue and is not included in the current round of pending event sequence. After identity normalization, the fields of the received events are standardized. The event occurrence time is taken from the business occurrence time field recorded in the source system, the data reception time is taken from the time the access gateway receives and writes the data into the reception queue, and the feature identifier is taken from a pre-maintained feature lookup table. To ensure consistency of field definitions in subsequent processing, each event is organized into a unified event record. The unified event record includes at least the following fields: user primary key, event number, source system identifier, event occurrence time, data reception time, feature identifier, feature value, event weight, transmission integrity identifier, and event trust level. The event number is generated by concatenating the source system code, user primary key, event occurrence time, and a homogeneous auto-incrementing sequence for identification. Retransmission events and duplicate reporting events; event credibility levels are divided into high, medium, and low, based on whether the event forms a business loop, whether it has external confirmation, and whether it only reflects the instantaneous interaction status; among them, closed-loop events such as transaction payment completion, reconciliation completion, refund completion, and membership level change are marked as high; confirmed behavior events such as login, search, add to cart, follow, and customer service consultation are marked as medium; and weak interaction events such as short stay, rapid swipe, accidental opening, and unconfirmed exposure are marked as low; key fields must include at least the user primary key, event occurrence time, source system identifier, and feature identifier, and the key field completeness rate is determined by the ratio of the number of filled key fields to the total number of key fields; For abnormal input, a traffic splitting rule is established. If a user's primary key is missing, the event is written to the completion queue, and identity completion is continuously performed within a waiting period of 5 to 30 minutes. The value of this waiting period is based on the fact that identity completion usually relies on near real-time login relationships, order relationships, or member binding relationships. A completion success rate is low when it is shorter than 5 minutes, and it will significantly affect the stability of the real-time processing channel when it is longer than 30 minutes. Therefore, 5 to 30 minutes is considered an feasible range. If the completion is still not completed after this waiting period, it is transferred to the offline compensation queue. If the event occurrence time is missing, it is filled with the data reception time and a time missing marker is added. If the feature identifier does not match the feature lookup table, then... Events are written to a temporary feature pool and assigned a pending classification status. Events in the temporary feature pool are re-matched after the feature lookup table is updated. If two consecutive matches fail, the event is written to the feature table pending review and is not processed in the real-time profile update process. The reason for using two consecutive failed matches as the exit condition is that the first failure may usually be caused by the time difference of the feature lookup table release or fluctuations in the source field format. Two consecutive failures can basically eliminate short-term synchronization errors, which is more in line with the stability requirements of real-time processing. If the same event number appears repeatedly, the one with the earliest data reception time and the highest key field completeness rate is retained. If the key field completeness rates of two events are the same, the one with the higher credibility level is retained. After completing the above processing, events are aggregated according to the unified user primary key to generate a sequence of events to be processed. The sequence of events to be processed for the same user can be selected as a set of events that arrive within the same access time window and have the same source system identifier or the same session identifier. Subsequently, the user primary key, event occurrence time, data reception time, source system identifier, and feature identifier are written into the receiving buffer divided by user as input for subsequent time-series correction processing. After the above processing, the data entering subsequent processing has completed identity normalization, field regularization, trust classification, anomaly diversion, and duplicate removal, which can provide a stable input basis for subsequent time-based sorting and correction.
[0021] Specifically, such as Figure 3 As shown: Timing correction is performed on the sequence of pending events formed in the preceding sequence. When the backlog of pending events for the same user reaches 2048, or the waiting time of the sequence head event from entering the receiving buffer reaches 35 milliseconds, timing correction is initiated. The backlog of 2048 is based on the fact that marketing events usually arrive in batches out of order during peak periods of promotion, placement, and member reach. If sorting is initiated when the backlog is too small, it is easy to trigger sorting processing frequently and increase computational overhead. If the backlog is too large, it will prolong the event dwell time and increase the real-time update latency. The waiting time of 35 milliseconds is based on the fact that this duration can cover the out-of-order range caused by common network jitter, asynchronous reporting, and short-term cache fluctuations, while not significantly prolonging the real-time profile update response time. Therefore, 2048 is used to define the event scale suitable for batch sorting, and 35 milliseconds is used to define the maximum waiting time for the head event. The two are respectively suitable for peak batch input scenarios and low-concurrency but latency-fluctuating scenarios. During time-series correction, the input data must include at least the following fields: event occurrence time, data reception time, source system identifier, event number, transmission integrity identifier, and event trust level. To avoid earlier events being overwritten by later but earlier-arriving events due to simply arranging data according to the order of data reception, a transmission delay statistics table is maintained separately for each source system. The transmission delay statistics table records the transmission delay distribution of the most recent 15,000 valid events for the corresponding source system, where transmission delay is defined as data reception time minus event occurrence time. The sample size of 15,000 is chosen because marketing activities switch significantly between peak and normal times. Too few samples are easily distorted by occasional traffic, while too many samples cannot reflect changes in transmission status in a timely manner. Therefore, 15,000 samples can balance statistical stability and response speed. When the number of valid samples from a source system is less than 3,000, the statistical results are first calculated using the currently available samples. If the number of samples is still less than 3,000, the most recently saved statistical results from the source system are used as a temporary benchmark. If the system is accessing the source system for the first time and there are no historical statistical results, the global default waiting threshold is used as a temporary benchmark. The global default waiting threshold can be set between 60 milliseconds and 120 milliseconds. In general web, mobile, and marketing platform access scenarios, 80 milliseconds is a good option. This value is based on the fact that when a newly accessed source system lacks historical statistical results, an intermediate value that can cover common upload fluctuations without significantly affecting real-time processing needs to be used as a transitional parameter. Below 60 milliseconds, the tolerance for initial out-of-order processing is insufficient, and above 120 milliseconds will significantly increase the initial processing waiting time. Therefore, setting the global default waiting threshold to 60 milliseconds to 120 milliseconds is more in line with the actual situation, with 80 milliseconds being suitable for most marketing data access scenarios. After obtaining the transmission delay statistics, the mean and standard deviation of the delay are calculated for each source system, and the dynamic waiting threshold for the corresponding source system is set to the mean delay plus 4.5 times the standard deviation. The coefficient of 4.5 is based on the fact that in business scenarios where the transmission delay is concentrated and has a small tail extension, the mean delay plus 4.5 times the standard deviation can cover most of the normal transmission fluctuation range, while delineating extremely long late events that significantly exceed the normal fluctuation range from the real-time processing range, thereby completing the normal out-of-order absorption without significantly increasing the waiting time. The dynamic waiting threshold is not a fixed value, but is updated according to statistical batches or a preset refresh cycle. The refresh cycle can be 30 seconds to 5 minutes. 30 seconds can be used in scenarios with high event arrival frequency, and 5 minutes can be used in low-frequency scenarios, to balance the timeliness of updates and computational costs. After calculating the dynamic waiting threshold, the sequence of pending events for the same user is rearranged in an ordered manner. During the rearrangement, the event occurrence time is used as the primary sorting criterion, and the data reception time is used as the secondary sorting criterion. When two events have the same occurrence time, they are sorted according to their event credibility level, with events with higher credibility levels listed first. If the event occurrence time and event credibility level are still the same, they are sorted according to their event numbers to ensure that the sorting results can be repeated. This sorting method prioritizes maintaining the true sequence of business actions and, in the case of the same time, prioritizes retaining events with higher credibility levels, reducing the interference of weak interaction events on subsequent profile judgments. For abnormal events where the data reception time is earlier than the event occurrence time, first determine whether the abnormality is caused by a time deviation in the source system. During this determination, abnormality statistics are performed on a per-source-system basis. Similar abnormalities refer to consecutive occurrences of data reception times earlier than the event occurrence time within the same source system. If the same source system experiences more than three similar abnormalities within the last 5 minutes, then that source system is marked as having a time deviation. The 5-minute time range and 3 abnormality counts are based on the fact that a single abnormality may typically be caused by short-term network fluctuations, reporting jitter, or field write delays, while more than three similar abnormalities occurring consecutively within a short 5-minute period better reflect a persistent time deviation in the source system's time records. Continuous deviation; after being marked as a time deviation state, the event occurrence time is corrected based on the median transmission delay of the most recent normal event in the source system. The corrected event occurrence time is the data reception time minus the median transmission delay of the corresponding source system. If only a single anomaly occurs, or the number of anomalies does not meet the above conditions, the event is written into the anomaly time queue and does not participate in the current round of real-time processing. Events in the anomaly time queue will be re-evaluated in the next round of timing correction. If an event is still judged as an anomaly time event in two consecutive rounds, it will be transferred to the offline verification queue. The reason for using the median transmission delay for correction is that the median value is not sensitive to a small number of extreme anomalies and is more suitable for time correction in the case of time deviation in the source system. For events that have not arrived by the time the dynamic waiting threshold is exceeded, the waiting is stopped. Instead, data segments that have met the waiting conditions are output first to form the event processing order for the current batch. Here, a data segment that has met the waiting conditions means that the latest event to enter the buffer in the data segment has also reached the dynamic waiting threshold of the corresponding source system, and no new events of the same segment are accepted before the start time of the data segment. Subsequent late events are written to the compensation queue and delayed repair is performed in the low-priority processing channel, but they are not directly written back to the real-time profile baseline. After the events in the compensation queue are reordered and their status is verified, they form compensation update records and are written to the subsequent review processing area to support subsequent offline correction or manual verification. They do not directly participate in the current batch profile baseline update. To ensure clear data output rules, a sliding time window is used to output corrected data. The sliding time window can range from 50 milliseconds to 200 milliseconds, with 50 milliseconds used during peak e-commerce periods, 100 milliseconds during normal off-peak periods, and 200 milliseconds for cross-regional low-concurrency scenarios. The rationale for setting the time window to 50 to 200 milliseconds is that during peak periods, events arrive densely, and a shorter output interval is beneficial for outputting corrected data as quickly as possible, reducing buffer backlog; during off-peak periods, events arrive sparsely, and a wider output interval is beneficial for absorbing more normal out-of-order events, avoiding premature output that results in overly fragmented sorted segments. The aforementioned 35 milliseconds is used to determine whether to initiate time-series correction, and 50 to 200 milliseconds are used to divide the data into completed and waiting periods. The data output ranges for these two types of events have different levels of influence. Peak periods, off-peak periods, and low-concurrency scenarios across regions can be identified based on the event arrival rate within the most recent second and the deployment range of the source system: 50 milliseconds are used when the event arrival rate exceeds 5000 events per second; 100 milliseconds are used when the event arrival rate is between 500 and 5000 events per second; and 200 milliseconds are used when the source system is distributed across regions and the event arrival rate is less than 500 events per second. The reason for using these ranges is that marketing platforms can generate a large number of tracking points and behavioral events per second during peak periods, and an excessively long output range would increase buffer backlog. In low-frequency scenarios, appropriately widening the output range is more conducive to completing normal out-of-order absorption. After completing the above processing, the event processing order is obtained according to the actual business occurrence order, and corresponding status flags are attached to late events, compensation events, and abnormal time events respectively; then, the event processing order is written into the subsequent processing buffer as input for subsequent feature classification processing.
[0022] Specifically, such as Figure 4 As shown: After the time-series correction is completed and the event processing sequence is formed, the time-sensitivity attributes of each feature in the event processing sequence are classified. The input data includes at least the fields of feature identifier, feature value, source system identifier, event credibility level, event occurrence time, and historical feature comparison information. The purpose of time-sensitivity attribute classification is to separate and process stable features that are suitable for entering the main baseline of the historical profile, short-term features that need to enter subsequent conflict verification processing, and transitional features that still need to be observed, so as to avoid short-term interactions directly affecting the long-term profile status. To facilitate processing, a feature timeliness mapping table is first established. This table maps feature identifiers to three states: long-term, short-term, and pending confirmation. Long-term features may include, for example, membership level, major consumer categories in the past 90 days, frequently visited consumption areas, historical average order value range, long-term price sensitivity, brands with stable repurchase rates, after-sales risk level, and payment ability level. Short-term features may include, for example, current page dwell time, current ad click, single keyword search, one-time coupon redemption, short-term add-to-cart, brief attention during promotions, and abnormal scrolling operations. Pending confirmation features are used to accommodate features from newly launched businesses, features with insufficient sample accumulation, and features for which stable judgment conditions have not yet been formed. The basis for setting the pending confirmation state is that there is a type of feature in marketing data that cannot be directly regarded as short-term noise, nor is it sufficient to be directly written into the long-term profile. Forcibly classifying it as long-term or short-term can easily cause classification distortion. For features supported by existing historical samples, their timeliness attributes are first determined based on the historical samples. If a feature can still stably participate in similar marketing decisions after 30 consecutive days of observation, it is classified as long-term. If a feature mainly plays a role in a single conversation or single-day reach scenario and decays significantly within 24 hours, it is classified as short-term. Features that fall between the two but are not yet sufficient to support stable classification are classified as pending confirmation. The basis for using 30 days as the long-term determination period is that user segmentation, repeat purchase analysis, member activity judgment, and monthly activity placement in marketing business are usually based on a natural month or the past 30-day window. If a feature continues to participate in similar decisions within 30 days, it indicates that it has strong stability. The basis for using 24 hours as the short-term determination period is that behaviors such as page dwell, ad clicks, instant search, and one-time coupon redemption are usually triggered and decay within a single day. After 24 hours, their role in instant marketing decisions decreases significantly. Therefore, 24 hours can better reflect the actual validity period of short-term behaviors. To further quantify the retention strength of different features, a normalized decay rate parameter can be configured for each feature in the feature time-to-value mapping table. The normalized decay rate parameter represents how quickly the value of a feature decreases over time; its value ranges from 0 to 1, with smaller values indicating slower decay and larger values indicating faster decay. The feature retention weight at the current moment can be calculated according to an exponential decay relationship, using the following formula:
[0023] Where λ is the normalized decay rate parameter, and t is the cumulative time length since the event occurred, with t in days. For long-term features, the normalized decay rate parameter can be between 0.00001 and 0.00005; for short-term features, the normalized decay rate parameter can be between 0.80 and 0.95. The reason for setting the decay rate parameter of long-term features between 0.00001 and 0.00005 is that these features usually correspond to long-term stable consumption capacity, living area, category preference, and membership attributes, and their change cycle is often measured in weeks or months, so a slower rate of value decline should be maintained. The reason for setting the decay rate parameter of short-term features between 0.80 and 0.95 is that these features mostly correspond to instant browsing, short-term clicks, or one-time interaction behaviors, and their impact often weakens rapidly in a short period of time, so a higher decay rate should be adopted. The above parameters can be corrected based on the evolution samples of the enterprise's historical profile, but overall, the configuration rule of low decay for long-term features and high decay for short-term features should be maintained. For features that do not match the feature timeliness mapping table, a preliminary judgment is performed according to the supplementary classification rules. To ensure that the preliminary judgment rules can be executed directly, the event weights are first quantified. The event weights range from 0 to 1 and can be determined comprehensively based on the event credibility level, behavior duration, business confirmation status, and cross-source corroboration. The event weights can be calculated as the sum of the products of each normalized result and the corresponding weight coefficient. The calculation formula is as follows: Event weight = 0.4 × Event credibility score + 0.2 × Behavior duration score + 0.2 × Business confirmation score + 0.2 × Cross-source corroboration score; The reason for using the weighting coefficients of 0.4, 0.2, 0.2, and 0.2 is that the event credibility level directly reflects the authenticity and reliability of the event and has the greatest impact on the timeliness judgment, so it is given a higher weight. The duration of the behavior, the business confirmation status, and the cross-source corroboration status reflect the user's level of involvement, the degree of closure, and the degree of verifiability, respectively. All three factors jointly affect the timeliness attribute judgment, but their importance is slightly lower than that of the event authenticity, so they are given the same secondary weight. This set of weights can be corrected based on the classification accuracy of historical labeled samples. Under the default configuration, the influence ratio of the event credibility level is increased first to ensure the stability of the initial judgment. To enable direct calculation of event weights, the values of each component are further normalized. Event credibility scores can be set to 1 for high, 0.6 for medium, and 0.2 for low. Behavior duration scores can be set to 1 for durations exceeding 300 seconds, 0.6 for durations between 30 and 300 seconds, and 0.2 for durations below 30 seconds. The 300-second and 30-second thresholds are used because, in marketing scenarios, a dwell time exceeding 300 seconds typically corresponds to strong attention, while durations below 30 seconds are mostly quick swipes or occasional triggers. Business... The score for each event can be calculated as follows: 1 for a closed loop of transactions such as payment, cancellation, refund, or membership change; 0.6 for confirmed actions such as adding to cart, following, or favorites but not closed loop; and 0.2 for actions such as browsing, dwelling, or clicking but not closed loop. The score for cross-source supporting evidence can be calculated as follows: 1 for 3 or more source systems, 0.6 for 2 or more source systems, and 0.2 for 1 or more source systems. The event weights obtained through the above rules reflect both the importance of the individual event and whether the event has cross-source support and closed loop information. After calculating the event weights, features that do not match the feature timeliness mapping table are initially judged according to the following rules: When a feature appears in more than 2 source systems within the past 7 days, with a frequency of more than 3 occurrences and an event credibility level of not less than medium, it is first classified as a feature to be confirmed; when a feature appears only in a single source and a single session, and the event weight is less than 0.3, it is directly classified as a short-term feature; the basis for setting the observation period to 7 days is that marketing activities, membership operations, and content outreach usually have weekly repetitive features, and 7 days can cover most short-term activities and user return behaviors; the basis for setting the number of source systems to more than 2 is that... Repeated occurrences from two or more sources can effectively eliminate noise from a single source; the reason for setting the occurrence frequency to 3 times or more is that consecutive occurrences of 3 times or more are more likely to reflect repetitive behavior rather than one-time occasional triggers; the reason for setting the event weight threshold to 0.3 is that events below 0.3 usually only reflect weak intentions or occasional operations, such as quick pauses, instantaneous returns, and accidental closing, which should not be given high retention strength for long-term profiles; features that do not hit the feature timeliness mapping table and do not meet any of the above conditions are uniformly classified as features to be confirmed, in order to avoid prematurely classifying features with insufficient samples or incomplete information into long-term or short-term categories; If the same feature appears in both high-credibility and low-credibility records in the current event processing order, the record with the higher event credibility level will be given priority as the basis for determining the timeliness attribute. If the event credibility levels are the same, the record that forms a business loop will be given priority. If the business loop status is still the same, the record with more source systems will be given priority. The basis for this approach is that the same feature may be triggered by different sources and different interaction intensities within the same time period. Giving priority to records with higher credibility, clearer business results, and more source support can reduce the interference of weak interactions on the timeliness classification results. Once a feature is confirmed and added to the observation pool, it will continue to be observed. The observation period can be 7 to 30 days, with an initial observation period of 7 days. If, within the initial observation period, the feature continues to appear in more than 2 source systems, with a cumulative frequency of more than 5 occurrences, and at least one record corresponds to an event with a medium or higher credibility level, it will be transferred to a long-term feature. If, within two consecutive observation periods, the feature consistently appears only in a single source or a single session, or the corresponding event weight remains below 0.3, it will be transferred to a short-term feature. If the conditions for transferring to a long-term or short-term feature are not met after reaching the 30-day observation limit, it will be written into the low-activity feature table and will not participate in the main baseline update. The reason for setting the observation limit to 30 days is that 30 days can cover most marketing behavior cycles that transition from short-term behavior to stable preferences. Features that do not form a clear trend after 30 days are generally not suitable for continuing to occupy observation resources. After classification, a long-term feature set, a short-term feature set, and a feature set to be confirmed are formed. The long-term feature set is used to interface with the main baseline of the historical profile. The short-term feature set enters the subsequent conflict verification process. The feature set to be confirmed is written into the observation pool and does not directly participate in the current batch of main baseline updates. The classified results are written into the subsequent processing buffer as input for subsequent conflict verification and observation processing.
[0024] Specifically, such as Figure 5 As shown: Once the short-term feature set is formed, if a long-term historical baseline already exists for the profile dimension corresponding to the short-term feature, then a conflict check is performed between the short-term feature and the user's historical profile features. The input data includes at least the short-term feature set, the historical profile baseline of the corresponding profile dimension, the feature timeliness mapping result, the normalized decay rate parameter, the event credibility level, the business confirmation status, and cross-source supporting information. This processing is used to distinguish between short-term noise, abnormal triggers, and real state migration, avoiding instantaneous behavior from directly rewriting the long-term profile status. To ensure that conflict verification can be executed directly, a profile dimension correspondence table is first established. This table is used to group features with different sources and field names but similar semantics into the same profile dimension. For example, viewing stroller details, searching for stroller discounts, adding strollers to the stroller shopping cart, and purchasing strollers offline can be grouped into the maternal and infant travel profile dimension; browsing high-end milk powder reviews, searching for milk powder ingredients, and purchasing infant formula offline can be grouped into the maternal and infant feeding profile dimension. Based on this correspondence, the values of the current short-term features under their respective profile dimensions are organized into the current feature vector, and the corresponding records of the same profile dimension in the historical profile baseline are organized into the historical baseline vector. Categorical features can be encoded using one-hot encoding or weighted encoding, while numerical features can be normalized according to a preset interval before being included in the vector. After encoding, the current feature vector and the historical baseline vector are normalized to ensure that the sum of each dimension component is 1, so as to facilitate subsequent comparison of distribution differences. After obtaining the current feature vector and the historical baseline vector, the distribution difference between them is calculated. This distribution difference can be measured using the Jensen-Shannon divergence. Let P be the normalized distribution corresponding to the current feature vector, Q be the normalized distribution corresponding to the historical baseline vector, and M be the average distribution. M equals the sum of P and Q divided by 2. The divergence value can be determined by the average of the relative entropy of P relative to M and the relative entropy of Q relative to M. The larger the divergence value, the higher the deviation between the current short-term feature and the historical profile baseline. The conflict verification threshold can be set to 0.68. Using 0.68 as the judgment threshold... The basis for this analysis is that, after retrospectively analyzing historical anomaly samples, it was found that in marketing scenarios, when the divergence between the current short-term behavior and the distribution of long-term interests exceeds 0.68, it more often corresponds to accidental touches, short-term activity noise, or external traffic interference, rather than stable preference changes. Therefore, using 0.68 as the threshold for distinguishing between normal fine-tuning and cross-semantic jumps is more in line with the actual situation. If the divergence value is not higher than 0.68, the current short-term feature is not identified as a conflicting feature, but is written into a low-weight buffer for reference during subsequent incremental updates. If the divergence value is higher than 0.68, it is identified as a conflicting feature. For features identified as conflicting, they are not allowed to be directly written to the historical profile baseline, but instead written to the user-level isolation cache. The user-level isolation cache is configured with isolation record units for both user and profile dimensions. Each isolation record unit includes at least the following fields: conflicting feature value, source system identifier, divergence value, entry time, normalized decay rate parameter, event credibility level, business confirmation status, isolation weight, and review status. After a conflicting feature enters an isolation record unit, an observation window is simultaneously activated. The observation window is segmented according to the normalized decay rate range. If the normalized decay rate parameter is between 0.80 and 0.85, the observation window can be 1 to 2 hours. For values between 0.85 and 0.90, the observation window can be 10 minutes to 1 hour; if the normalized decay rate parameter is higher than 0.90, the observation window can be 10 minutes to 30 minutes. The basis for adopting the above segmentation rules is that the higher the normalized decay rate, the faster the influence of the feature decays, and the observation window should be shortened accordingly; the closer the normalized decay rate is to 0.80, the more persistent the feature is, although it is short-lived, it still has a certain degree of persistence, and a longer cross-validation time should be given; if the event corresponding to the short-lived feature has a high credibility level and forms payment, reconciliation, refund or other business confirmation information, the observation window can be extended to 6 hours to avoid misjudging the real state migration as short-term noise. Within the observation window, the system continuously retrieves feature update information from other source systems for the same user, the same profile dimension, and whose direction of change is consistent with the current conflicting feature. To complete the review, the system calculates the number of independent sources, the same-direction support, and the aggregation distance for the similar features retrieved within the observation window. The number of independent sources refers to the number of source systems providing same-direction evidence within the observation window. The same-direction support is the value obtained by averaging the weights of the same-direction event corresponding to each independent source, with a value ranging from 0 to 1. The reason for using only one highest-weighted event from each source is to avoid the undue amplification of support caused by repeated reporting from the same source. The aggregation distance is the average Euclidean distance between the same-direction feature vectors corresponding to each independent source and the center vector. The center vector is obtained by weighting the same-direction feature vectors of each independent source within the observation window according to the event weights. The reason for using a weighted average center vector is that events with higher weights usually have higher authenticity and stronger behavioral orientation, and their participation in the center vector calculation can improve the stability of multi-source consistency judgment. When the number of independent sources reaches 5 or more, or reaches 60% or more of the total number of currently enabled independent sources that can provide features for this profile dimension, and the same-direction support is not less than 0.6 and the aggregation distance is not greater than 0.15, the conflicting feature can be considered not noise, but a potential real state migration. The 60% or more total number of sources is determined by rounding up. The basis for setting the number of independent sources to 5 or more is that when multiple source systems continuously provide same-direction evidence, it can effectively distinguish between single-point false triggers and multi-source consistent changes. When the number of currently available source systems is less than 5, using 60% or more as an alternative condition can take into account the data access scale of different enterprises. The basis for setting the aggregation distance threshold to 0.15 is that when the average Euclidean distance between normalized vectors is less than 0.15, it usually indicates that multi-source evidence has formed a high degree of consistency within the same profile dimension, making it suitable as a judgment condition for breaking through isolation. When the above conditions are met, the review result is recorded as passed, and the corresponding conflicting feature is written into the subsequent incremental update processing area. If, after the observation window ends, the number of independent sources is less than two, or the co-directional support is less than 0.6, the conflicting feature is considered short-term noise, and the review result is recorded as failing the review. The reason for using less than two independent sources as a negative condition is that a single source cannot exclude local false triggers, temporary anomalies, or source bias itself. The reason for setting the co-directional support threshold to 0.6 is that when the average weight of the main co-directional evidence in each independent source is less than 0.6, it usually only reflects low-to-medium intensity sporadic behavior and is insufficient to support long-term changes in the profile status. For conflicting features that fail the review, the historical profile baseline is not updated, but is retained in the isolation buffer according to the corresponding normalized decay rate and decayed exponentially. The isolation weight is gradually reduced; the isolation weight can be refreshed every 10 minutes; the refresh cycle of 10 minutes is chosen because it is consistent with the shortest observation window, which can reflect the decay of conflict features in a timely manner without increasing the processing burden due to excessively frequent refreshes; when the isolation weight is lower than 0.05, or the cumulative isolation time exceeds 24 hours, a clearing process is performed; the basis for using 0.05 as the clearing threshold is that when the feature retention weight drops below 5%, the impact of the feature on subsequent judgments is negligible; the basis for using 24 hours as the cumulative isolation limit is that if a short-term conflict feature has not formed a valid review result within 24 hours, it is usually no longer suitable to continue to be retained within the real-time processing range; If, at the end of the observation window, the number of independent sources has reached more than two and the same-direction support is not less than 0.6, but the review conditions are not met, the observation status is considered to continue. For the continued observation status, the observation window is extended only once, and the extension period can be 50% of the original observation window. The basis for taking the extension ratio as 50% is that this ratio can provide an additional observation opportunity for conflict features that have not yet formed a clear conclusion without significantly increasing the waiting cost. The observed objects after the extension are still subject to the aforementioned isolation weight threshold and cumulative isolation time limit. After the extended observation period ends, the review conditions are still re-evaluated according to the aforementioned review conditions. If the review conditions are met, the object is transferred to the subsequent incremental update processing area. If the review conditions are not met, or the isolation weight drops below 0.05, or the cumulative isolation time reaches 24 hours, the object is cleared. If no clear conclusion is formed at the end of the extended observation period, the object remains in isolation, the historical profile baseline is not updated, and the remaining retention time and re-check time are added to the isolation record unit. After the above processing is completed, three types of review conclusions are generated: passed review, failed review, and continued observation. Conflict features that pass review are written into the subsequent incremental update processing area; conflict features that fail review continue to be isolated until decay and cleared; conflict features that continue to be observed are retained in the isolation buffer area with the remaining observation time attached; then, the relevant results are written into the subsequent processing buffer as input for subsequent incremental update processing.
[0025] Specifically, such as Figure 6 As shown: When a long-term feature set is formed, short-term features that have not entered conflict isolation processing have completed buffering, or conflict features have been reviewed and determined to have passed the review, an incremental update is performed on the user profile node; the input data includes at least the long-term feature set, the short-term feature set that has not entered conflict isolation processing, the conflict features that have passed the review, the current user profile baseline snapshot, the profile dimension correspondence, and the update rules corresponding to each feature, etc.; this process does not perform a complete overwrite of the entire profile node, but generates local incremental content based on the effective features of this round, and only updates the corresponding profile dimensions to reduce the repeated writing and full recalculation of irrelevant dimensions; Before executing the update, the current user profile node snapshot is read according to the profile dimensions. The user profile node snapshot includes at least the following information: interest and preference dimension, price sensitivity dimension, category maturity dimension, membership value dimension, marketing reach response dimension, and status dimension. Based on the source and conclusion of the input features, the update objects in this round are divided into three categories: long-term features, short-term features that have not entered the conflict isolation process, and conflict features that have passed the review. Long-term features are used to smooth and correct long-term stable attributes, short-term features that have not entered the conflict isolation process are used to supplement current local changes, and conflict features that have passed the review are used to drive state migration or strong correction after verification from multiple sources. The basis for using three categories of rules to process them separately is that the duration, stability, and credibility of the three types of features are different. If a unified update method is used, it is easy for short-term behavior to cause excessive disturbance to the long-term profile, or for the migration signals that have been verified to not be written in time. When determining the update content, first calculate the local residual according to the portrait dimension type; if the target dimension is a numerical scoring dimension, the local residual is the difference between the current effective feature value and the baseline value, multiplied by the corresponding update weight; if the target dimension is a categorical preference dimension, the difference between the current category weight vector and the baseline category weight vector is multiplied by the corresponding update weight to form a vector residual; if the target dimension is a state dimension, generate state transition instructions and state transition confidence; the state transition confidence ranges from 0 to 1, and can be determined by the weighted average of the event credibility level score, review intensity score, and business confirmation state score. The calculation formula can be written as: State transition confidence level = 0.4 × event credibility score + 0.4 × verification strength score + 0.2 × business confirmation status score; The event credibility score can be set to 1 for high, 0.6 for medium, and 0.2 for low; the business confirmation status score can be set to 1 for completing a payment, cancellation, refund, or other business loop; 0.6 for confirmed actions such as adding to cart, following, or favorites but not yet closed loop; and 0.2 for weak actions such as browsing, clicking, or dwelling; the review strength score can be determined by comprehensively considering the same-direction support output after review, the source coverage ratio, and the aggregation distance. The source coverage ratio is the ratio of the number of independent sources to the total number of currently available sources, and the aggregation distance adjustment value is 1 minus the ratio of the aggregation distance to the corresponding threshold. The review strength score can be written as: 0.5 × same-direction support amount + 0.3 × source coverage ratio + 0.2 × aggregation distance adjustment value; The prerequisite for state transition is that the current baseline state matches the target migration path, and the confidence level of the state transition is not lower than 0.7. The reason for using 0.7 as the confidence level threshold for state transition is that once the state dimension changes, it will usually directly affect subsequent marketing judgments and outreach actions. Therefore, state switching is only allowed on the basis of high confidence. When it is lower than 0.7, only candidate migration information is recorded, and the current state is not directly changed. To ensure that update rules for different features can be directly executed, update weights are quantitatively configured. The update weights for long-term features are set at 0.6 to 0.9, for short-term features not yet subject to conflict isolation processing at 0.05 to 0.3, and for conflict features that have passed review at 0.7 to 1.0. The reason for setting the update weights for long-term features at 0.6 to 0.9 is that these features reflect long-term stable attributes and should maintain a strong influence, but still need to retain smoothing space to avoid excessively pulling on the historical baseline with a single input. The reason for setting the update weights for short-term features at 0.05 to 0.3 is that these features are only used to supplement current local changes and should not cause excessive disturbance to the long-term profile. The reason for setting the update weights for conflict features that have passed review at 0.7 to 1.0 is... These features have already received consistent support from multiple sources and should be allowed to make strong corrections to historical profiles. The specific values within the range can be determined in segments according to the event's credibility level, review intensity, and business confirmation status: when the event's credibility level is high and a payment, cancellation, refund, or continuous repurchase loop is formed within a preset period, the update weight of long-term features can be 0.85 to 0.9; when the event's credibility level is low and only browsing, clicking, or short-term dwell occurs, the update weight of short-term features can be 0.05 to 0.15; when conflicting features meet the review conditions and the number of independent sources reaches more than 5 or more than 60% of the total number of currently available sources, the update weight can be 0.9 to 1.0; through the above segmented values, events with different credibility levels and different confirmation intensities correspond to different update magnitudes. After determining the update weights, local incremental content is generated. For long-term features, a stable fusion method is used for updating, that is, the difference between the current effective feature value and the baseline value is multiplied by the long-term update weight to form a stable increment, which is then written into the corresponding dimension to smooth and correct long-term profiles. For short-term features that have not entered conflict isolation processing, a low-weight compensation method is used for updating, that is, the difference between the current effective feature value and the baseline value is multiplied by the short-term update weight to form a fine-tuning increment to supplement the current short-term behavior, but without changing the dominant direction of the historical baseline. For conflict features that have passed the review, a migration update method is used. If the target dimension is numerical or categorical, the corresponding dimension is directly corrected according to the high-weight residual. If the target dimension is state-based, state switching is performed when the state transition conditions are met; otherwise, only the score corresponding to the state is updated, and the state label is not directly rewritten. Through the above rules, long-term features exhibit smooth fusion, short-term features exhibit limited compensation, and conflict features that have passed the review exhibit clear migration, thus ensuring that features of different natures maintain clear boundaries in the same round of updates. To avoid local distortion caused by abnormal events due to excessively large single update magnitudes, a limiting rule is set for local residuals. If the target dimension is a numerical rating dimension, the absolute value of the local residual is limited to within 20% of the current dimension range. If the target dimension is a categorical preference dimension, the single change of each component of the vector residual is limited to within 0.2. The basis for using 20% and 0.2 as limiting thresholds is that user profile updates should usually reflect gradual changes rather than single drastic jumps. When a single change exceeds 20% of the current dimension range, or a single category component change exceeds 0.2, it is more likely to be caused by abnormal events, sudden local traffic surges, or single accidental touches, and is not suitable as a direct update amount. The dimension range here is determined according to the preset rating range or standardized range of the dimension. Each component of the category weight vector is in the range of 0 to 1, so using 0.2 as a single component limit is more in line with the actual situation of normalized weight updates. After completing the local incremental calculation, a local write is performed on the user profile node. The system does not rewrite the entire profile node, but only performs incremental writes on the update positions corresponding to the target profile dimension. If the target dimension adopts a floating-point scoring structure, the local residual is added to the corresponding scoring position through atomic accumulation. If the target dimension adopts a category weight structure, the target category weights are compared and swapped to avoid overwrite contention during concurrent writes. If the target dimension adopts a state structure, the current state version and the previous state are checked first before the state update is performed. The basis for using the local write method is that this round of update usually only affects a few profile dimensions. If the entire node is overwritten, it will not only increase the amount of writing, but also easily introduce concurrent conflicts of unrelated dimensions. To ensure data consistency in concurrent scenarios, the snapshot version identifier is recorded simultaneously when reading the user profile baseline snapshot. If the current version identifier has changed before writing, it indicates that other updates have been written to the same node during this update. In this case, the latest snapshot is reread and the local residual is recalculated. The maximum number of recalculations is 3. The reason for setting the maximum number of recalculations to 3 is that in high-concurrency scenarios, a small number of recalculations can resolve most concurrency contention. If it still fails after more than 3 attempts, it usually indicates that the node is in a state of continuous contention. Continuing to recalculate will significantly increase the consumption of processing resources, so immediate retry is stopped. If the number of recalculations exceeds 3, the update request is written to the delayed update queue and a supplementary update is performed after the node contention intensity decreases, so as to avoid the real-time processing process occupying computing resources for a long time. If the target node is temporarily unavailable, or a temporary error occurs during the write process, the incremental content will be written to the update temporary record. The update temporary record includes at least the following fields: user identifier, profile dimension, local residual value, target status, event occurrence time, generation time, and retry flag. After the node becomes available again, the update temporary record will be read in the order of event occurrence time and the write will be re-executed. The reason for retrying in the order of event occurrence time is that profile updates need to maintain the sequential relationship of business time as much as possible. If only the generation time is replayed, it may cause late events to overwrite previous states. After the update is completed, the updated user profile node snapshot, local residual value, and residual magnitude are output. The residual magnitude represents the overall change magnitude of the local increment. If the target dimension is a numerical rating dimension, the residual magnitude is the absolute value of the local residual. If the target dimension is a categorical preference dimension, the residual magnitude is the Euclidean norm of the vector residual. If the target dimension is a state dimension, the residual magnitude is determined by the product of the state transition confidence and the update weight. The reason for using the residual magnitude as the output is that it reflects the actual impact of the update on the user profile node and can serve as an important input for determining the scope of the subsequent related updates. After this processing is completed, the updated user profile node snapshot, local residual value, and residual magnitude are written into the subsequent processing buffer as input for determining the scope of the subsequent related updates.
[0026] Specifically, such as Figure 6 As shown: Determine the scope of related updates for user profile nodes that have completed partial incremental updates; the input data includes at least the updated user profile node, the residual modulus corresponding to this update, the update dimension, the event credibility level, the node association table, and the current system load status, etc.; this process is used to determine whether this update needs to be propagated to related targets, and to which directly related targets, so that the local changes only affect the profile dimensions, marketing tags, or control items that have a direct business impact on them; To determine the scope of related updates, a node association table is pre-established. This table describes the directly related dimensions, related tags, or related control items that may be affected when a certain profile dimension changes. The node association table can be generated jointly based on pre-maintained business impact rules and historical linkage statistics. Business impact rules determine explicit business dependencies, while historical linkage statistics supplement the dimension relationships that repeatedly change within a preset observation period for the same user. Each association must include at least the following fields: source dimension identifier, target dimension identifier, association direction, association weight, whether propagation is allowed, and propagation type. The association weight ranges from 0 to 1 and can be determined by the business impact coefficient and the historical linkage coefficient. The calculation formula can be written as: Association weight = 0.6 × business impact coefficient + 0.4 × historical linkage coefficient; The business impact coefficient can be set to 1 for high, 0.6 for medium, and 0.2 for low. The historical linkage coefficient can be determined by the proportion of the same user changing simultaneously in the source dimension and the target dimension within a preset period. The basis for using 0.6 and 0.4 as combination coefficients is that business rules reflect explicit business meaning and should account for the main proportion, while historical linkage results reflect covariance relationships in actual data and can be used as supplementary corrections. If the historical linkage sample is less than 7 days, the association weight should be determined first according to the business impact coefficient, and the historical linkage coefficient should be introduced for correction after the sample meets the 7-day requirement. Before executing the propagation decision, the residual magnitude of the updated output is standardized. If the target dimension is a numerical rating dimension, the residual magnitude is converted to the range of 0 to 1 according to the preset range of that dimension. If the target dimension is a categorical preference dimension, it is normalized by dividing the Euclidean norm of the vector residual by the maximum allowable vector change of that dimension. The maximum allowable vector change can be determined by the upper limit of 0.2 for a single change of each component and the number of effective categorical components participating in the change. If the target dimension is a state dimension, it is normalized by the theoretical maximum value of the product of the state transition confidence and the update weight, with the theoretical maximum value being 1. The range of the standardized residual magnitude is 0 to 1. The purpose of standardization is to make the residual magnitudes obtained under different dimension types comparable, thereby facilitating the unified execution of the propagation decision. After standardization, the first step is to determine whether the current update meets the propagation trigger condition. If the standardized residual magnitude is less than 0.03, it is determined that the change only occurred within the current node and is insufficient to trigger external propagation. Subsequent update control is limited to the current node, and no update requests are sent to any associated targets. The basis for using 0.03 as the blocking threshold is that when the local residual is less than 0.03 after standardization, it usually only reflects slight smoothing correction or local noise compensation. Its impact on downstream dimensions is insufficient to support a new associated update. If propagation is still triggered, it is easy to cause meaningless linkage calculations. If the standardized residual magnitude is not less than 0.03, the propagation range determination process begins. During the determination of the propagation scope, the propagation level is determined by combining the event credibility level, the update dimension type, and the node association table. If the event credibility level is high and the update dimension is marked as a core source dimension that allows single-hop propagation in the node association table, then single-hop propagation is allowed. If the event credibility level is medium, propagation is only allowed to the pre-configured direct association tags or direct association control items in the node association table. If the event credibility level is low, the corresponding update is first written to the pending propagation queue and propagation is not executed immediately. Here, single-hop propagation means that propagation is only allowed from the current update dimension to its directly associated target, and recursive propagation to the second layer or deeper targets is not allowed. The basis for adopting single-hop control is that local dimension changes in the marketing profile usually only have a significant impact on the most direct business objectives. If propagation continues to deeper targets, it is easy to cause irrelevant diffusion and cascading recalculation. When propagation is permitted, attenuation control is applied to the propagation update amount; the propagation update amount for associated targets can be determined as follows: Propagation update quantity = Standardized residual magnitude × Association weight × 0.2; The source update direction is retained; here, 0.2 means that the associated target retains only 20% of the source node's influence, that is, the propagation momentum is attenuated by 80%; the basis for setting the attenuation ratio to 80% is that the associated target should only bear the indirect impact of the source node's change, and should not be equivalent to the direct change of the source node. Retaining 20% of the influence can cover the correction range required for direct business linkage, while avoiding excessive propagation that could cause cascading amplification; if a certain relationship in the node relationship table is marked as a read-only observation relationship, the corresponding target node will only record the propagation suggestion and will not perform actual writing; Different update control methods can be adopted for different types of related targets. If the target node is a ranking node, the propagation update amount is written into the corresponding ranking score to adjust the display priority of recommendation candidate tags, activity package ranking tags, or coupons. If the target node is a frequency control node, the reach frequency control value or reach interval weight is adjusted according to the propagation update amount. If the target node is a strategy identifier node, only the corresponding strategy score or candidate priority is updated, and the final execution identifier is not directly rewritten. If the target node is a recall control node, after writing the high-priority candidate identifier, the most recent transaction behavior is still required to be verified again before recall execution to avoid false recall. In this way, local profile changes only propagate to positions that have a direct impact on the business, without affecting dimensions or control items that have no direct relationship. To ensure feasibility in high-concurrency scenarios, a lightweight propagation mode is activated when the system load is abnormal. The trigger conditions for lightweight propagation mode can be set as follows: the current update queue length exceeds 5 times the average queue length during the stable operation phase, or the image node write time exceeds 20 milliseconds for three consecutive monitoring cycles. The average queue length during the stable operation phase can be determined by the average queue length of the same time period over the past 7 days; if historical data is less than 7 days, it should be determined by the average of the available historical days over the same time period; if no usable historical data is available, the system's preset baseline value should be used. The monitoring cycle can be set to... The threshold is 1 to 5 seconds, with 1 second being suitable for typical real-time marketing scenarios. The reason for setting the queue length threshold to 5 times is that when the queue length reaches more than 5 times the stable average, it usually indicates a significant congestion state. The reason for setting the write latency condition to be more than 20 milliseconds for three consecutive monitoring periods is that while single write jitter may originate from instantaneous resource contention, more than 20 milliseconds for three consecutive monitoring periods better reflects continuous write pressure. The 20 millisecond value is based on the fact that profile node updates usually need to maintain low latency, and exceeding 20 milliseconds may affect real-time reach and recommendation timeliness. After entering lightweight propagation mode, only updates with a high event credibility level and a normalized residual magnitude exceeding 0.1 are allowed to trigger single-hop propagation. All other updates are restricted to the current node and do not undergo external propagation. The reason for using 0.1 as the lightweight propagation threshold is that, under abnormal system load scenarios, it is necessary to further increase the propagation threshold to distinguish between general fluctuations and significant changes, retaining only updates that have a significant impact on subsequent marketing actions for propagation. Propagation requests suppressed in lightweight propagation mode are written to a delayed propagation queue, and external propagation is carried out in chronological order of event occurrence after the system load recovers. When expanding the propagation time, the original propagation range determination results, original standardized residual modulus length, and original association weights are still used, without re-amplifying the propagation level. The exit condition for lightweight propagation mode can be set as follows: the current update queue length recovers to less than twice the average queue length during the stable operation phase, and the image node write time is less than 20 milliseconds for five consecutive monitoring cycles. The basis for using twice the average queue length and five consecutive monitoring cycles as exit conditions is that the exit threshold should be lower than the entry threshold to avoid the system frequently switching operating modes in a critical state. At the same time, the recovery to normal for five consecutive monitoring cycles indicates that the system has re-entered a stable state. If the writing of the associated target fails during the propagation process, the propagation event is written to the retry record and retried according to a three-level backoff strategy. The retry interval can be 1 minute, 5 minutes, or 15 minutes. The reason for using a three-level backoff of 1 minute, 5 minutes, and 15 minutes is that this setting can provide a quick retry opportunity during the short-term fault recovery phase and reduce the frequency of invalid retries in the case of continuous faults. If three consecutive retries still fail, the propagation event is transferred to the manual inspection queue, but the original node update that has been successfully completed is not rolled back. The reason for using three failures as the condition for manual inspection is that three consecutive failures usually indicate that there is a persistent anomaly in the target node, and the success rate of continuing automatic retry is low. The original node update has already been completed based on the current input. If the original node is rolled back due to the failure of the associated target, it is easy to cause inconsistency between the local profile status and the actual input. After completing the above processing, the propagation scope determination result, propagation update record, delayed propagation record, and failure retry record are obtained. If the current update does not meet the propagation triggering condition, only the update result of the current node is retained. If the propagation triggering condition is met, the associated update control is completed according to the single-hop range. If it is in lightweight propagation mode, only the direct propagation of high-confidence and high-residual updates is retained, and the remaining requests are transferred to the delayed propagation queue. Finally, the propagation scope determination result and its corresponding control result are written into the subsequent processing buffer as input for the profile update completion record and subsequent audit trail.
[0027] Example 2: Based on Example 1, the specific application process of a marketing user profile dynamic update method based on multi-source data fusion is further explained: Taking the process of a maternal and infant retail company dynamically updating its target user profile during a major promotional event as an example, the overall operational logic of the method of this invention will be further explained. The company simultaneously runs a web-based event tracking system, a mobile application event tracking system, an advertising platform, a customer relationship management platform, a membership transaction platform, a coupon platform, a customer service ticketing platform, and an offline store transaction system. Each system continuously generates behavioral events, transaction events, reach events, and feedback events related to the same user. The target user has been marked as a stable buyer of maternal and infant products in the historical profile, and in the past 90 days, their main consumption categories have been concentrated on infant formula, baby care products, and stroller accessories. This applies to mid-to-high-end customers. The user has a stable purchase history for priced maternal and infant products and is a high-value member in the membership system. The current issue the company needs to address is whether user behaviors such as clicking on low-priced electronic products, quickly returning, and briefly staying during peak sales events should be directly added to their long-term profile. Directly overwriting this profile could easily lead to users' long-term maternal and infant preferences being mistakenly covered by short-term promotional noise, and further trigger large-scale erroneous adjustments to recommendation, coupon sorting, and outreach strategies. Based on this, the following uses the user's continuous behavior during the event as an example to explain the overall operation process of event reception, time sequence correction, timeliness classification, conflict isolation and review, local incremental updates, and controlled propagation. During the actual operation, the target user engaged in the following behaviors between 10:02 AM and 10:08 AM: browsing the baby formula details page twice in the mobile application, searching for stroller discounts once on the web, clicking on a low-price 3C clearance ad on the advertising platform and exiting within 3 seconds, completing a baby wipes order payment on the member transaction platform, claiming a maternal and infant discount coupon on the coupon platform, and submitting a summary of an inquiry about the expiration date of the formula on the customer service ticket platform; and at 12:15 PM, a stroller accessory purchase transaction was recorded by the offline store transaction system. When the above events reached the user profile processing system, they did not strictly follow the time of the business occurrence. Some web-based events arrived first, some ad click events arrived later, and some store transaction records were received several minutes after the event occurred due to store network delays. The system first receives events from various source systems, merging account numbers, device numbers, member numbers, and transaction numbers into the same user primary key. It then adds information such as event occurrence time, data reception time, source system identifier, feature identifier, and event trust level to each event. For events related to transaction payments, offline purchases, and member level changes, the system assigns a high-level trust level; for browsing, clicking, and short-term dwell events, it assigns a lower-level trust level; and for customer service inquiries, searches, and adding items to cart events, it assigns a medium-level trust level. After field normalization, the system generates a sequence of pending events for the same user and writes this sequence into a user-defined receiving buffer as input for subsequent timing correction processing. Since the user generates multiple actions consecutively within a short period, the system also aggregates events arriving in the same round to avoid repeatedly creating files for the same intent during continuous, second-level interactions. After forming a sequence of events to be processed, the system performs time-series correction on the user's event sequence. Instead of directly determining the processing order based on the data reception order, the system reorders the events based on the event occurrence time, data reception time, and historical transmission delay statistics of each source system. Specifically, the system calculates the mean and standard deviation of the delay for each source system based on the effective event transmission delay distribution over a recent period across web pages, mobile devices, advertising platforms, transaction platforms, and store systems, and forms a dynamic waiting threshold. For this user, although the click log from the advertising platform arrived first, its event occurrence time occurred during the baby formula browsing event. After the event but before the offline store transaction event; although the store transaction data is received later, it is a highly reliable closed-loop event, so after time-series correction, it is still arranged in its actual business time position; if an anomaly is found in a source system where the data reception time is earlier than the event occurrence time during the correction process, the system further determines whether it is a time deviation of the source system; if the number of anomalies does not reach the continuous deviation condition, it is first sent to the abnormal time queue and does not participate in this round of real-time update; after the above processing, the system obtains the event processing order of the user according to the actual business occurrence order, and adds status markers to late events, compensation events and abnormal time events; After obtaining the event processing sequence, the system categorizes the various features based on their timeliness. According to the feature timeliness mapping table and historical sample rules, features such as membership level, high-value membership attributes, major consumer categories in the past 90 days, stable maternal and infant purchase preferences, and long-term price sensitivity are classified as long-term features. Features such as current ad clicks, short page dwell times, single keyword searches, one-time coupon redemptions, and short-term add-to-cart actions are classified as short-term features. Newly launched activity tags, temporary interaction tags with insufficient samples, and behavioral features that have not yet formed a stable trend are classified as features awaiting confirmation. The user's milk powder browsing activity in this round of processing... Browsing, searching for stroller discounts, claiming maternity and baby coupons, and purchasing stroller accessories offline are mapped to the maternal and infant feeding dimension and the maternal and infant travel dimension, respectively. Due to their relatively stable sources and the fact that some behaviors have a business closed loop, they are generally closer to long-term characteristics or characteristics that are transitioning from pending confirmation to long-term characteristics. On the other hand, clicking on low-priced 3C clearance ads and exiting within 3 seconds only occur in a single ad source and a single short session, and the event weight is low. Therefore, they are classified as short-term characteristics. Through this classification, the system can converge objects that may conflict with the historical baseline to a small number of short-term characteristics, thus narrowing the scope for subsequent conflict verification. Subsequently, the system performs conflict verification between short-term features and the user's historical profile features. For this user, the historical profile baseline has consistently pointed to high-value consumption in the maternal and infant sector, while the current short-term feature showing clicks on low-priced 3C clearance ads deviates significantly from long-term maternal and infant consumption preferences semantically. The system first organizes the current short-term feature into a current feature vector based on the profile dimension correspondence, and organizes the historical maternal and infant preference baseline into a historical baseline vector. Then, it calculates the Jensen-Shannon divergence based on the normalized distribution. If the divergence value does not exceed the conflict threshold, the short-term feature is considered a general short-term fluctuation and will only participate in low-weight compensation updates. If the divergence value exceeds the conflict threshold, a semantic conflict is identified. For the click on the low-priced 3C ads, since it differs significantly from long-term maternal and infant preferences and the divergence exceeds the conflict threshold, the system writes it to the user-level isolated cache instead of directly writing it to the historical profile baseline. After writing to the isolation cache, the system sets an observation window based on the normalized decay rate of the short-term feature. Within the observation window, it continuously searches for similar features from other source systems that are for the same user, the same profile dimension, and have the same direction of change. If multiple pieces of evidence related to browsing, searching, adding to cart, or paying for electronic products are subsequently received from mobile applications, web pages, or offline transaction terminals, the system further calculates the number of independent sources, the amount of support in the same direction, and the aggregation distance to determine whether the deviation is a real state migration. If only the advertising platform provides evidence after the observation window ends, and the amount of support in the same direction is insufficient, the feature is identified as short-term noise, kept in isolation, and awaits natural decay and removal. In this case, the system did not find any evidence of continuous electronic product consumption from the transaction platform, web page, or offline store within the observation window, so the 3C advertisement click was ultimately determined to be short-term noise that failed the review. After completing the conflict review, the system determines the incremental update content based on the review conclusion and performs local incremental updates on the user profile nodes. In this scenario, features such as baby wipe order payment, baby stroller accessory store purchase, baby formula browsing, and baby stroller discount search are applied to the corresponding profile dimensions using either long-term updates or low-weight short-term compensation updates. Clicks on 3C advertisements that fail the review do not participate in the main baseline update. The system first reads the user profile baseline snapshot and calculates the local residuals according to the dimension type. If the target dimension is a numerical rating dimension, such as price sensitivity rating, category maturity rating, and membership value rating, then the current valid feature value is compared with the baseline. The difference between line values is multiplied by the corresponding update weight to form the difference residual. If the target dimension is a categorical preference dimension, such as maternal and infant feeding preference, maternal and infant travel preference, and electronic product preference, then the vector residual is formed by multiplying the difference of the category weight vector by the update weight. If the target dimension is a state dimension, such as a high purchase intention state, an activity-sensitive state, or a churn risk state, then a state transition instruction and a state transition confidence are generated. Since the user has completed the maternal and infant-related transaction loop in this round and the event credibility level is high, the maternal and infant feeding dimension and the maternal and infant travel dimension receive higher update weights. Short-term search and coupon behavior are only used as supplementary fine-tuning and do not change the long-term dominant direction. After completing the local residual calculation, the system does not rewrite the entire user profile node. Instead, it only performs local writes on the positions affected by the current round of effective features in the interest preference dimension, category maturity dimension, price sensitivity dimension, and reach response dimension. If a change in the snapshot version is detected before writing, the latest snapshot is reread and the local residual is recalculated. If the target node is temporarily unavailable, the local increment is written to the update temporary record. The update temporary record includes at least the user identifier, profile dimension, local residual value, target status, event occurrence time, and generation time. After the node is restored, the replay update is performed in the order of event occurrence time. After the update is completed, the system outputs the updated user profile node snapshot, local residual value, and residual magnitude to characterize the actual change magnitude of the user profile node caused by this update and to provide input basis for subsequent determination of the scope of related updates. After completing a local incremental update, the system determines the scope of its propagation. Instead of indiscriminately propagating the update to all related dimensions, the system first determines whether external propagation is necessary based on the node association table and the standardized residual modulus. If the standardized residual modulus is below the propagation threshold, the change is considered meaningful only within the current node and does not trigger external propagation. If the standardized residual modulus reaches the propagation threshold, the propagation level is determined by combining the event credibility level and the update dimension type. For cases of increased maternal and infant feeding preferences and maternal and infant travel preferences, since these changes directly affect the ranking of recommended candidate tags, maternal and infant activity packages, and the display priority of maternal and infant coupons, and the update sources include high-credibility transactions and offline store purchases, single-hop propagation is allowed. During propagation, the system multiplies the standardized residual modulus by the association weight and by the diffusion decay coefficient to obtain the propagation update amount of the related target. Based on this, it only updates the ranking of recommended tags, the priority of activity packages, and the reach frequency control value directly related to maternal and infant interests, without directly rewriting the user value level, lifecycle stage, or enterprise-level user segmentation results. When the system is under high load, the propagation process is further restricted. For example, when the update queue length exceeds a preset multiple of the average value during stable operation, or when node write time continues to increase, the system starts a lightweight propagation mode. In lightweight propagation mode, only updates with high reliability and high residuals are allowed to trigger single-hop propagation. Other propagation requests are written to a delayed propagation queue and then expanded in the order of event occurrence after the system load recovers. If a write to a related target fails during propagation, the corresponding propagation event is written to the retry record and automatic retry is performed according to the preset backoff interval. If multiple retries still fail, the propagation event is transferred to the manual inspection queue, but the original node update that has been successfully completed is not rolled back. As can be seen from the above specific application process, the system does not simply write multi-source behaviors into the user profile. Instead, it first completes identity unification and temporal correction, and then classifies them according to the timeliness attributes of features. Only truly credible and verified changes are written into the main baseline, and controlled propagation is only performed within the necessary scope. For short-term 3C ad clicks in this example, the system identifies them as noise through conflict isolation and observation verification to avoid them mistakenly covering the user's long-term maternal and infant profile. For maternal and infant related transaction loops and continuous interest behaviors, local residual updates and single-hop propagation are used to promptly apply them to recommendation tags, activity package ranking, and reach control. In this way, the system can maintain the responsiveness of the user profile to the real state transition, and suppress the damage to the long-term profile stability caused by short-term noise, accidental clicks, and high-concurrency out-of-order input, thereby improving the accuracy of profile updates, processing stability, and operational reliability in high-concurrency scenarios.
[0028] It should be noted that this invention can be deployed on the device itself to realize embedded applications, or it can run on a PC or other terminal with a user interface, thereby meeting various hardware environments and usage requirements.
[0029] The above embodiments can be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above embodiments can be implemented in whole or in part by a computer program product. The computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, the processes or functions of the embodiments of this application are implemented in whole or in part. The computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions can be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another. For example, the computer instructions can be transmitted wirelessly or wiredly from one website, computer, server, or data center to another website, computer, server, or data center. Wired methods include optical fiber, twisted pair, coaxial cable, etc. Wireless methods include infrared, microwave, etc. Available media include any available media that can be accessed by a computer or data storage devices such as servers and data centers that contain one or more sets of available media. Available media can be magnetic media (floppy disks, hard disks, magnetic tapes), optical media (DVDs), or semiconductor media. Semiconductor media can be solid-state drives.
[0030] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
[0031] In conclusion, the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions, improvements, etc., made within the spirit and principles of the present invention should be included within the protection scope of the present invention.
Claims
1. A method for dynamically updating marketing user profiles based on multi-source data fusion, characterized in that, include: S1: Receive multi-source marketing event data, extract user identifiers, event occurrence time, data reception time, and feature identifiers, and generate a sequence of events to be processed; S2: Perform timing correction on the sequence of events to be processed based on the event occurrence time and data reception time to determine the event processing order; S3: Classify the features in the event processing order according to the timeliness attribute corresponding to the feature identifier. When classifying features, pre-establish the feature timeliness mapping relationship and configure the attenuation parameter. Determine long-time features, short-time features and features to be confirmed based on historical sample rules. S4: Perform conflict verification between short-term features and user historical profile features, isolate conflicting features, and review them in conjunction with similar feature update information from other data sources within a preset observation window; S5: Determine the incremental update content based on the review conclusion, and perform incremental updates on the user profile nodes; S6: Determine the scope of the update for the user profile nodes that have completed the incremental update, and execute subsequent update control according to the determined scope.
2. The method for dynamically updating marketing user profiles based on multi-source data fusion according to claim 1, characterized in that, S1 includes: When generating a sequence of events to be processed, marketing events from multiple sources are collected and received according to the access time window. A unified user primary key mapping is completed based on account identifier, device identifier, member identifier, or transaction identifier. Fields such as event occurrence time, data reception time, source system identifier, feature identifier, feature value, event weight, transmission integrity identifier, and event trust level are standardized to form a unified event record. For events with missing user primary keys, missing event times, unidentified features, and duplicate events, the processes of completion, padding, temporary storage, and deduplication are performed respectively. Then, the events are aggregated according to the unified user primary key to form a sequence of events to be processed.
3. The method for dynamically updating marketing user profiles based on multi-source data fusion according to claim 1, characterized in that, S2 includes: When determining the order of event processing, timing correction is initiated according to preset backlog conditions or head waiting conditions; Transmission delay statistics are performed separately for each source system. When there are insufficient valid samples, the dynamic waiting threshold is determined by the current sample, historical statistical results or the default waiting threshold. Then, the data is rearranged according to the event occurrence time, data reception time, event trust level and event number. Identify the source time deviation of time anomalies and correct them according to the median transmission delay or write them into the anomaly time queue; Events that have not arrived even after exceeding the dynamic waiting threshold are output in segments according to a sliding time window and a compensation update record is generated.
4. The method for dynamically updating marketing user profiles based on multi-source data fusion according to claim 3, characterized in that, Identify the source time deviation of time-related anomalies and correct them according to the median transmission delay or write them into the anomaly time queue, including: When handling time-related abnormal events, similar abnormalities are statistically analyzed by source system, and the time deviation status is identified based on the occurrence of abnormalities within a preset period. For source systems that meet the judgment criteria, the event occurrence time is corrected based on the median transmission delay of the most recent normal event; For abnormal events that do not meet the judgment criteria, they are written into the abnormal time queue, do not participate in the current round of processing, and continue to be judged in subsequent time sequence correction. Events that are consistently flagged as abnormal after multiple rounds of verification will be moved to offline verification processing.
5. The method for dynamically updating marketing user profiles based on multi-source data fusion according to claim 1, characterized in that, S3 includes: For the characteristics of missing mapping relationships, the event weight is calculated and initially classified by combining the event credibility level, behavior duration, business confirmation status, cross-source corroboration status, number of sources, frequency of occurrence, and session distribution. For multiple records with the same characteristic, the best one is selected based on the degree of credibility, the status of the business loop, and the source support. Features to be confirmed are written into the observation pool for further classification, and are included in the corresponding feature set when the transfer conditions are met.
6. The method for dynamically updating marketing user profiles based on multi-source data fusion according to claim 5, characterized in that, For multiple records with the same characteristic, the best record is selected based on its credibility, business loop status, and source support, including: For multiple candidate records with the same feature in the current event processing order, first compare the event credibility levels; When the event trust levels are the same, then compare the business closed-loop status; When the business loop status is the same, then compare the number of source systems; The timeliness attribute determination record of the feature is determined based on the comparison order.
7. The method for dynamically updating marketing user profiles based on multi-source data fusion according to claim 1, characterized in that, S4 includes: When handling conflicts, short-term features are merged according to the corresponding relationship of the portrait dimensions and formed a normalized vector with the corresponding historical portrait baseline. Conflict is determined based on the distribution differences. For features identified as conflicting, write them into isolated record units set according to user and profile dimensions, and set the observation window based on the normalized decay rate parameter; During the observation period, retrieve same-dimensional and same-direction features from other sources, review and determine them according to the number of independent sources, same-direction support, and aggregation distance, and update, continue observation, or isolate and remove them according to the determination status.
8. The method for dynamically updating marketing user profiles based on multi-source data fusion according to claim 7, characterized in that, For features identified as conflicting, they are written into isolated record units set according to user and profile dimensions, and observation windows are set based on normalized decay rate parameters, including: For the characteristics determined by conflict, isolation record units are set up according to user identifier and profile dimensions, and the conflict feature value, source system identifier, distribution difference value, entry time, normalized decay rate parameter, event credibility level, business confirmation status, isolation weight and review status are registered. The observation window is set according to the range of the normalized decay rate parameter, and the observation period is extended for records with high event confidence level and business confirmation.
9. The method for dynamically updating marketing user profiles based on multi-source data fusion according to claim 1, characterized in that, S5 includes: When determining the incremental update content, read the user profile baseline snapshot according to the profile dimension, and perform stable fusion, low weight compensation and migration update on long-term features, short-term features that have not entered the conflict isolation process and conflict features that have passed the review, and generate difference residuals, vector residuals or state transition information according to the dimension type. Based on the event trust level, review intensity, and business confirmation status, the update weight is determined and limited before partial write is performed. When the snapshot version changes, the update is recalculated. When the node is unavailable or write is abnormal, the update is temporarily stored or delayed. The updated user profile node snapshot, local residual value, and residual modulus are recorded.
10. The method for dynamically updating marketing user profiles based on multi-source data fusion according to claim 1, characterized in that, S6 includes: For user profile nodes that have completed incremental updates, the node association relationship and association weight are determined based on business impact rules and historical linkage statistics. The residual modulus is standardized according to dimension type, and the propagation range is determined in combination with event credibility level and system load status. For updates that meet the propagation conditions, single-hop propagation is performed according to the association weight and attenuation coefficient, and update control is performed according to the type of the associated target. Updates that do not meet the conditions for propagation, are in a state of light propagation, or have failed to propagate are handled by either local retention, delayed propagation, retrying, or manual inspection, respectively.