Small sample transfer modeling optimization method and system based on contrastive learning

By using a few-sample transfer modeling method based on comparative learning, the problem of insufficient stability in cross-domain transfer is solved. Through data preprocessing and feature encoding network optimization, robustness and rapid adaptability of cross-domain transfer are achieved, improving the convergence speed and generalization performance of the model.

CN121525535BActive Publication Date: 2026-06-23BAIWEIJINKE (SHANGHAI) INFORMATION TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
BAIWEIJINKE (SHANGHAI) INFORMATION TECH CO LTD
Filing Date
2026-01-16
Publication Date
2026-06-23

AI Technical Summary

Technical Problem

Existing small-sample modeling methods are difficult to apply stably across devices, acquisition conditions, or operating states, resulting in insufficient quantification of cross-domain transfer stability and unstable convergence during training.

Method used

By employing a few-sample transfer modeling method based on contrastive learning, cross-domain contrastive transfer data is periodically collected. Time alignment, noise suppression, anomaly removal, missing data completion, and scale standardization are performed to construct a fixed-length feature vector set and generate a training sample sequence. A low-dimensional contrastive representation is output using a feature encoding network, and the stability of the representation and cross-domain stability are evaluated. A robust sample set is constructed, and the parameters of the feature encoding network are updated.

Benefits of technology

It improves the operability and reliability of cross-domain transfer, avoids representation drift and semantic fragmentation, significantly improves the convergence speed and generalization performance of the model, and realizes closed-loop optimization of cross-domain transfer modeling.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121525535B_ABST
    Figure CN121525535B_ABST
Patent Text Reader

Abstract

The application discloses a small sample transfer modeling optimization method and system based on contrast learning, relates to the technical field of transfer data processing, and comprises the following steps: S1, collecting cross-domain contrast transfer data, and performing preprocessing on the collected data; S2, constructing a fixed-length feature vector set and generating a training sample sequence; S3, constructing a feature coding network and outputting a low-dimensional contrast representation, quantitatively evaluating the representation stability, cross-domain stability and cross-domain complexity of the training sample, and generating a cross-domain robust quantization value; S4, constructing a robust sample set and a candidate sample set, and constructing a positive and negative sample pair according to the low-dimensional contrast representation similarity, evaluating the contrast learning loss of the current training batch, and updating the feature coding network parameters based on the loss result. The application solves the problem that the existing transfer modeling technology is difficult to process cross-domain index differences, resulting in insufficient cross-domain transfer stability quantization and unstable training process convergence.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of migration data processing technology, specifically to a method and system for optimizing small-sample migration modeling based on contrastive learning. Background Technology

[0002] With the continuous popularization of information processing equipment, sensor nodes, and computing platforms, various complex engineering systems are gradually forming a multi-source heterogeneous data environment composed of operational status records, process event sequences, terminal interaction signals, and environmental change data. These data vary significantly in acquisition frequency, structural form, and noise distribution, making it easy to encounter problems such as insufficient sample size, inconsistent feature structures, and significant scene offsets during cross-device, cross-regional, or cross-task modeling processes. In practical scenarios such as equipment operation monitoring, process control optimization, and environmental perception modeling, it is often necessary to uniformly represent multi-scenario data under limited sample conditions and achieve cross-environment model transfer and rapid adaptation.

[0003] For example, the invention with publication number CN120911335A discloses a small-sample aerodynamic modeling method based on multi-task learning; it includes: constructing a multi-task prediction model, integrating an auxiliary task network and a target task network, and enhancing the feature extraction capability of the encoder through SE layer and attention layer; acquiring multi-source aerodynamic data and performing standardized preprocessing; designing a dynamic weight mechanism to adaptively adjust the influence of auxiliary task prediction on the target task output according to input features, achieving effective fusion of low-fidelity and high-fidelity data; performing two-stage training to ensure the efficiency and stability of multi-task learning; verifying the model's prediction capability under limited sample conditions; and achieving high-precision aerodynamic prediction under limited sample conditions through the synergistic effect of multi-task knowledge transfer, feature enhancement, and dynamic task integration, combined with the feature optimization characteristics of SE layer and attention mechanism, providing an efficient and accurate aerodynamic prediction method for aircraft design.

[0004] For example, the invention with publication number CN119066870A discloses a small-sample modeling method for aerodynamic / torque coefficients based on symbolic regression, including: S1, collecting aerodynamic data of a first aircraft; S2, constructing a correlation parameter expression for the aerodynamic / torque coefficients of the aircraft based on flow state variables; S3, collecting small-sample aerodynamic data of a second aircraft and optimizing the constant term in the correlation parameter expression accordingly; S4, using the optimized correlation parameter expression to perform polynomial fitting on the small-sample aerodynamic data to obtain a polynomial expression for the aerodynamic / torque coefficients.

[0005] However, existing small-sample modeling methods typically rely on a uniformly distributed data environment, making them difficult to apply stably across different devices, acquisition conditions, or operational states. In real-world engineering systems, different acquisition terminals exhibit variations in sampling frequencies, measurement scales, and noise levels, causing feature distributions to shift across different scenarios, resulting in misalignment of the representation space and drift of class boundaries. Particularly when sufficient samples are lacking in new operating conditions, models struggle to construct stable cross-domain representations, failing to meet the demands of complex systems for rapid transfer and reliable inference.

[0006] Therefore, in order to address the above problems, there is an urgent need for a few-sample transfer modeling optimization method and system based on contrastive learning. Summary of the Invention

[0007] Technical problems to be solved

[0008] To address the shortcomings of existing technologies, this invention provides a small-sample transfer modeling optimization method and system based on contrastive learning, which solves the problems of existing transfer modeling techniques being unable to handle cross-domain index differences, resulting in insufficient quantification of cross-domain transfer stability and unstable convergence during the training process.

[0009] Technical solution

[0010] To achieve the above objectives, the present invention provides the following technical solution: a small-sample transfer modeling optimization method based on contrastive learning, comprising: S1, periodically collecting cross-domain contrastive transfer data, and performing time alignment, noise suppression, anomaly removal, missing data completion, and scale standardization on the collected data to generate preprocessed cross-domain contrastive transfer data; S2, constructing a fixed-length feature vector set based on event timestamps based on the preprocessed cross-domain contrastive transfer data, and generating a training sample sequence; S3, constructing a feature encoding network based on the fixed-length feature vectors and outputting low-dimensional contrastive representations, quantitatively evaluating the representation stability, cross-domain stability, and cross-domain complexity of the training samples, generating cross-domain robust quantization values ​​based on the quantization evaluation results, and forming a training sample transfer sequence; S4, constructing a robust sample set and a candidate sample set based on the training sample transfer sequence, constructing positive and negative sample pairs based on the similarity of the low-dimensional contrastive representations, evaluating the contrastive learning loss of the current training batch, and updating the feature encoding network parameters based on the loss results.

[0011] Furthermore, the specific steps for periodically collecting cross-domain comparison migration data and performing time alignment, noise suppression, anomaly removal, missing data completion, and scale standardization on the collected data to generate preprocessed cross-domain comparison migration data are as follows: A fixed-width sliding time window is set as one sampling period, and cross-domain comparison migration data is collected periodically. This cross-domain comparison migration data includes terminal interaction duration, single session dwell time, number of clicks, request response time, terminal network downlink speed, event timestamp, interval between adjacent events, event sequence length, number of platform entry requests, cross-platform forwarding latency, single operation value, and operations within the period. The system collects numerical values, API call exception counts, log write rates, and scenario domain numbers. For the collected cross-domain comparison migration data, it performs unified correction on timestamps from different data sources based on a multi-source timestamp synchronization correction mechanism, and suppresses and smooths sudden fluctuations and measurement noise in the cross-domain comparison migration data using a moving average filtering algorithm. It identifies and removes abnormal sampling points using an anomaly detection method based on the local outlier factor algorithm, and completes locally missing data using the K-nearest neighbor interpolation algorithm. Finally, it performs numerical standardization on the cross-domain comparison migration data using the Z-score standardization algorithm to unify the dimensional scales of different physical quantities.

[0012] Furthermore, the specific steps for constructing a fixed-length feature vector set based on the preprocessed cross-domain comparison migration data according to the event timestamp and generating a training sample sequence are as follows: extract the preprocessed cross-domain comparison migration data, sort all cross-domain comparison migration data according to the event timestamp, assign a unique sample index to each group of cross-domain comparison migration data, concatenate the cross-domain comparison migration data under the same sample index into a fixed-length feature vector according to a fixed field order, and append the corresponding scene domain number as an independent feature dimension to the fixed-length feature vector to form a training sample sequence containing the sample index, the fixed-length feature vector, and the scene domain number.

[0013] Furthermore, based on fixed-length feature vectors, a feature encoding network is constructed and a low-dimensional contrastive representation is output. The stability, cross-domain stability, and cross-domain complexity of the training samples are quantitatively evaluated. Based on the quantitative evaluation results, cross-domain robust quantization values ​​are generated, and a training sample transfer sequence is formed. The specific steps are as follows: For the fixed-length feature vectors in the training sample sequence, a multilayer perceptron neural network algorithm is used to construct a feature encoding network. The fixed-length feature vectors are used as inputs and sequentially fed into each hidden layer neuron to perform linear transformations and nonlinear activation operations, obtaining the corresponding low-dimensional contrastive representations. A correspondence table is established between the sample index, fixed-length feature vector, low-dimensional contrastive representation, and scene domain number according to the same index number. For each training sample, the set of adjacent samples with the same scene domain number is retrieved from the correspondence table, and the low-dimensional contrastive representation based on the current training sample and all adjacent samples are calculated. The mean similarity between the low-dimensional comparative representations of the samples is used to obtain the representation stability evaluation value of the current training sample. For each fixed-length feature vector, the corresponding terminal interaction duration, single session dwell time, number of clicks, event sequence length, number of interface call exceptions, number of platform entry requests, number of operation values ​​within the period, and single operation value are extracted and comprehensively calculated to obtain the cross-domain stability evaluation value. The corresponding adjacent event interval time, request response time, terminal network downlink rate, cross-platform forwarding latency, and log writing rate are extracted and comprehensively calculated to obtain the cross-domain complexity evaluation value. The cross-domain stability evaluation value is multiplied by the representation stability evaluation value and then divided by the corresponding cross-domain complexity evaluation value to obtain the cross-domain robust quantization value of the current migration in the corresponding scenario domain. The training samples are then sorted from largest to smallest according to the cross-domain robust quantization value to obtain the training sample migration sequence.

[0014] Further, the specific steps for extracting the corresponding terminal interaction duration, single session dwell time, number of clicks, event sequence length, number of API call anomalies, number of platform entry requests, and the total number of operations within a period and the total number of single operations to comprehensively calculate the cross-domain stability assessment value are as follows: Sum the terminal interaction duration, single session dwell time, and number of clicks, add one, and take the natural logarithm to obtain the basic activity item; divide the event sequence length by the sum of the event sequence length and a constant, and then square it to obtain the sequence structure item; divide the number of API call anomalies by the sum of the number of platform entry requests and a constant, use the negative of the resulting ratio as the exponent, and perform exponential operations with the natural constant e as the base to obtain the anomaly suppression item; divide the total number of operations within a period by the sum of the total number of operations within a period and the total number of single operations to obtain the operation ratio mapping item; multiply the basic activity item, sequence structure item, anomaly suppression item, and operation ratio mapping item sequentially to obtain the cross-domain stability assessment value.

[0015] Furthermore, the specific steps for extracting the corresponding adjacent event interval time, request response time, terminal network downlink rate, cross-platform forwarding latency, and log writing rate, and comprehensively calculating the cross-domain complexity evaluation value are as follows: Add one to the adjacent event interval time to obtain the time expansion term; divide the request response time by the sum of the terminal network downlink rate and a constant, and add the cross-platform forwarding latency and a constant to obtain the transmission hindrance term; add one to the log writing rate to obtain the writing perturbation term; multiply the time expansion term, transmission hindrance term, and writing perturbation term in sequence to obtain the cross-domain complexity evaluation value.

[0016] Further, the specific steps for constructing robust sample sets and candidate sample sets based on training sample transfer sequences, and constructing positive and negative sample pairs based on the similarity of low-dimensional contrastive representations, are as follows: Based on the training sample transfer sequences, the median of the cross-domain robust quantization value of the current batch of training samples is extracted. Training samples with cross-domain robust quantization values ​​not lower than the median are divided into robust sample sets, and training samples with cross-domain robust quantization values ​​lower than the median are divided into candidate sample sets. The sample indices of all training samples in the two sets are recorded respectively. In each training round, N robust samples are extracted from the robust sample set to construct a cross-domain training batch. The corresponding low-dimensional contrastive representation and scene domain number are obtained from the correspondence table through the sample index. The similarity between the low-dimensional contrastive representations of any two robust samples in the training batch is calculated. Two sample pairs with a low-dimensional contrastive representation similarity greater than the similarity threshold and the same scene domain number are recorded as positive sample pairs. Two sample pairs with a low-dimensional contrastive representation similarity not greater than the similarity threshold and different scene domain numbers are recorded as negative sample pairs.

[0017] Further, the specific steps for evaluating the contrastive learning loss of the current training batch are as follows: The similarity between the low-dimensional contrastive representation of the robust sample and each sample in the positive sample set is used as an exponent, and exponential operations are performed with the natural constant e as the base. The sum of all exponential operation results is then obtained to obtain the positive sample aggregation term. The similarity between the low-dimensional contrastive representation of the robust sample and each sample in both the positive and negative sample sets is used as an exponent, and exponential operations are performed with the natural constant e as the base. The sum of all exponential operation results is then obtained to obtain the global set contrastive term. The negative natural logarithm of the ratio of the positive sample aggregation term to the global set contrastive term is taken and multiplied by the corresponding cross-domain robust quantization value and representation stability evaluation value to obtain the single-sample contrastive loss term. All single-sample contrastive loss terms are summed and then divided by the sum of the products of the cross-domain robust quantization values ​​and representation stability evaluation values ​​of all samples in the robust sample set, plus one, to obtain the contrastive learning loss evaluation value.

[0018] Furthermore, the specific steps for updating the feature encoding network parameters based on the loss results are as follows: Based on the contrastive learning loss evaluation value, perform gradient backpropagation and update the encoding network parameters for the feature encoding network; after the encoding network parameters are updated, select candidate samples from the candidate sample set whose cross-domain robust quantization values ​​are not lower than the quantile threshold and whose low-dimensional contrastive representation similarity with any robust sample is not lower than the similarity merging threshold, merge them into the robust sample set and update the set index; repeatedly execute the training batch construction, loss evaluation, parameter update and sample set maintenance process; when the change in the contrastive learning loss evaluation value of M consecutive training rounds is lower than the loss change threshold, terminate the training and fix the encoding network parameters to achieve sample transfer modeling optimization.

[0019] The second aspect of this invention provides a small-sample transfer modeling optimization system based on contrastive learning, comprising: a data acquisition and preprocessing module, a sample index feature generation module, a representation robustness score calculation module, and a cross-domain robustness quantization evaluation module, wherein: the data acquisition and preprocessing module is used to periodically acquire cross-domain contrastive transfer data and perform time alignment, noise suppression, anomaly removal, missing data completion, and scale normalization on the acquired data to generate preprocessed cross-domain contrastive transfer data; the sample index feature generation module is used to construct a fixed-length feature vector set based on event timestamps based on the preprocessed cross-domain contrastive transfer data and generate a training sample sequence; the representation robustness score calculation module is used to construct a feature encoding network based on the fixed-length feature vectors and output a low-dimensional contrastive representation, quantify the representation stability, cross-domain stability, and cross-domain complexity of the training samples, generate a cross-domain robustness quantization value based on the quantization evaluation result, and form a training sample transfer sequence; the cross-domain robustness quantization evaluation module is used to construct a robust sample set and a candidate sample set based on the training sample transfer sequence, construct positive and negative sample pairs based on the similarity of the low-dimensional contrastive representations, evaluate the contrastive learning loss of the current training batch, and update the feature encoding network parameters based on the loss result.

[0020] Beneficial effects

[0021] The present invention has the following beneficial effects:

[0022] (1) A small sample transfer modeling optimization method and system based on contrastive learning. Through a three-dimensional model of cross-domain stability, cross-domain complexity and representation stability, a unified evaluation of multi-source heterogeneous monitoring data across scene domains is carried out, so that cross-domain data that were originally not directly comparable have quantifiable robustness characteristics, which fundamentally improves the operability and reliability of small sample domain transfer.

[0023] (2) A method and system for optimizing small sample transfer modeling based on contrastive learning. By simultaneously utilizing the low-dimensional contrastive representation and its similarity structure output by the encoding network, a representation stability evaluation system is constructed, so that the samples maintain a highly consistent semantic structure during cross-domain transfer, thereby effectively avoiding common problems such as representation drift and semantic discontinuity in traditional small sample transfer.

[0024] (3) A small sample transfer modeling optimization method and system based on contrastive learning, by using cross-domain robust quantization value and representation stability as weight factors of the loss function, enables low-dimensional representation learning to adaptively adjust the optimization direction according to the importance and risk of the sample in the cross-domain, thereby significantly improving the convergence speed and generalization performance of the model and avoiding the convergence difficulty problem that traditional contrastive learning encounters in cross-domain tasks.

[0025] (4) A small sample transfer modeling optimization method and system based on contrastive learning, through the dynamic update of training sample transfer sequences and robust sample sets, enables the model to continuously receive new cross-domain data, recalculate cross-domain robust quantization values ​​and update encoding network parameters during operation, thereby realizing closed-loop optimization of cross-domain transfer modeling, enabling the model to have continuous learning ability and adapt to the ever-changing scene domain environment. Attached Figure Description

[0026] Figure 1 Flowchart for optimizing few-shot transfer learning modeling method based on contrastive learning;

[0027] Figure 2 Optimize the system architecture diagram for few-shot transfer learning-based contrastive learning;

[0028] Figure 3 A comparison chart of cross-domain stability, representation stability, and cross-domain complexity evaluations;

[0029] Figure 4 A bar chart showing the cross-domain robust quantization values ​​for different scenario domains. Detailed Implementation

[0030] The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0031] Please see Figures 1-4This invention provides a technical solution: a small-sample transfer modeling optimization method based on contrastive learning, comprising: S1, periodically collecting cross-domain contrastive transfer data, and performing time alignment, noise suppression, anomaly removal, missing data completion, and scale standardization on the collected data to generate preprocessed cross-domain contrastive transfer data; S2, constructing a fixed-length feature vector set based on event timestamps based on the preprocessed cross-domain contrastive transfer data, and generating a training sample sequence; S3, constructing a feature encoding network based on the fixed-length feature vectors and outputting low-dimensional contrastive representations, quantitatively evaluating the representation stability, cross-domain stability, and cross-domain complexity of the training samples, generating cross-domain robust quantization values ​​based on the quantization evaluation results, and forming a training sample transfer sequence; S4, constructing a robust sample set and a candidate sample set based on the training sample transfer sequence, constructing positive and negative sample pairs based on the similarity of the low-dimensional contrastive representations, evaluating the contrastive learning loss of the current training batch, and updating the feature encoding network parameters based on the loss results.

[0032] Specifically, the process of periodically collecting cross-domain comparison migration data and performing time alignment, noise suppression, anomaly removal, missing data completion, and scale standardization on the collected data to generate preprocessed cross-domain comparison migration data is as follows: A fixed-width sliding time window is set as one sampling period, and cross-domain comparison migration data is collected periodically. This data includes terminal interaction duration, single session dwell time, number of clicks, request response time, terminal network downlink speed, event timestamp, interval between adjacent events, event sequence length, number of platform entry requests, cross-platform forwarding latency, single operation value, number of operations within the period, number of API call exceptions, log write rate, and scenario domain number. Specifically, the terminal interaction duration is derived from the difference between the front-end page loading lifecycle and the data collection time; the single session dwell time is derived from the time difference between the start and end events of the client session; and the number of clicks is derived from the front-end interaction data collection statistics. The following parameters are used for calculation: request response time is derived from the time taken for the client to initiate a request and receive a response; terminal network downlink speed is derived from the real-time bandwidth statistics of the operating system network interface; event timestamps are derived from the event trigger times recorded by the server and client; the interval between adjacent events is derived from the relative time difference between consecutive events of the same user; event sequence length is derived from the statistics of the number of events sorted by time; the number of platform entry requests is derived from the call log count of the server entry API; cross-platform forwarding latency is derived from the forwarding time recorded in the link tracing log; the numerical value of a single operation is directly collected from the numerical input fields in the business event processing process; the numerical value of operations within a period is calculated from the cumulative results of all numerical input fields within the same sampling period; the number of interface call exceptions is derived from the server error log count; the log write rate is derived from the statistics of the log pipeline's write volume per second; and the scene domain number is derived from the unified mapping of the scene domain to which the collected data belongs in the business configuration table. For the collected cross-domain comparison migration data, a multi-source timestamp synchronization correction mechanism is used to uniformly correct the timestamps of different data sources. A moving average filtering algorithm is then used to suppress and smooth sudden fluctuations and measurement noise in the cross-domain comparison migration data. During the timestamp synchronization correction process, all event timestamps are offset according to a unified time reference coordinate system, ensuring that data generated from different scenario domain nodes, different terminal devices, and different acquisition links are arranged on a unified timeline, avoiding time misalignment in cross-domain comparison migration data. The moving average filtering algorithm performs weighted smoothing on adjacent sampled values ​​during execution, resulting in stable outputs of quantitative indicators such as terminal interaction duration, single session dwell time, number of clicks, request response time, and terminal network downlink speed. This reduces the impact of link jitter, instantaneous fluctuations at the terminal side, and disturbances caused by the accuracy of data acquisition points.An anomaly detection method based on the local outlier factor algorithm is used to identify and remove abnormal sampling points, and the K-nearest neighbor interpolation algorithm is used to complete locally missing data. During the anomaly detection phase, local density deviation is calculated for each cross-domain comparative migration data, identifying distortion points in fields such as log write rate, number of abnormal API calls, and number of platform entry requests, and removing samples that cause a shift in the overall statistical distribution. For locally missing items formed in the removed data segments, the K-nearest neighbor interpolation algorithm is used to find the nearest neighbor feature vector under the same scene domain number. Interpolation results are generated based on the differences in features such as terminal interaction duration, request response time, and interval between adjacent events within the neighborhood, ensuring that the completed content maintains a distribution pattern consistent with the scene domain behavioral characteristics; where K is a positive integer greater than three. The Z-score standardization algorithm is used to perform numerical standardization on cross-domain comparative migration data to unify the scale of different physical quantities. During the standardization process, the overall mean and standard deviation of each feature field are used to generate standardized values. This allows data with different dimensions and orders of magnitude, such as terminal interaction duration, single session dwell time, number of clicks, request response time, terminal network downlink speed, cross-platform forwarding latency, single operation value, and operation value within a period, to be uniformly expressed in the same scale space. This provides a reliable input basis for the subsequent construction of fixed-length feature vectors and low-dimensional comparative representations.

[0033] In this implementation scheme, a consistent, scale-uniform, and noise-controllable input foundation is established for cross-domain comparative transfer data through a unified acquisition method and end-to-end data preprocessing. This ensures that cross-domain comparative transfer data from different sources with significant distributional differences are normalized into a feature set with continuous temporal characteristics and stable statistical properties before entering the modeling process. Through this process, the cross-domain comparative transfer data exhibits higher consistency in the temporal, numerical, and structural dimensions, enabling cross-domain differences to be presented more realistically. Simultaneously, it avoids random shifts in the original data caused by acquisition links, edge behavior, and scene fluctuations, providing a clearer input environment for the subsequent feature encoding network to extract transferable representations. This significantly improves the learnability, generalization ability, and overall robustness of cross-domain comparative transfer modeling.

[0034] Specifically, the steps for constructing a fixed-length feature vector set based on event timestamps from preprocessed cross-domain comparison migration data and generating training sample sequences are as follows: Extract the preprocessed cross-domain comparison migration data, maintaining the consistent field order of terminal interaction duration, single session dwell time, number of clicks, request response time, terminal network downlink speed, event timestamp, interval between adjacent events, event sequence length, number of platform entry requests, cross-platform forwarding latency, single operation value, operation value within a period, number of interface call exceptions, log writing rate, and scene domain number. This ensures that cross-domain comparison migration data from all sampling periods enters the processing flow with a unified field structure. All cross-domain comparison migration data are sorted according to event timestamps, so that page access behavior and link call behavior from different scene domains present a continuous relationship on a unified timeline. A unique sample index is assigned to each group of cross-domain comparison migration data. During the assignment process, cross-domain comparison migration data from the same source are grouped under the same index based on the continuity of event timestamps, so that each group of data can maintain the relevant structure of the behavior chain. Cross-domain comparison migration data under the same sample index are concatenated into a fixed-length feature vector according to a fixed field order. During the concatenation process, the fixed positions of terminal interaction duration, single session dwell time, number of clicks, request response time, terminal network downlink rate, interval between adjacent events, event sequence length, number of platform entry requests, cross-platform forwarding latency, single operation value, operation value within a period, number of interface call exceptions, and log writing rate in the feature vector are maintained, so that the feature expression of different samples is completely consistent in dimensions. The corresponding scene domain number is added as an independent feature dimension to the fixed-length feature vector, so that each vector has a clear cross-domain attribute identifier while representing user behavior features. This provides input basis for subsequent cross-domain difference modeling based on scene domain number, forming a training sample sequence containing sample index, fixed-length feature vector and scene domain number. This enables cross-domain comparison transfer data to have a structured expression capability that is indexable, organized and can be used for feature encoding network training.

[0035] In this implementation scheme, cross-domain comparative transfer data is organized based on a unified field structure, continuous temporal order, and fixed feature layout. This ensures that behavioral records from different scene domains possess a consistent temporal arrangement, stable feature location distribution, and clear scene domain attribute expression before entering the modeling stage. After the above processing, each data record can fully present the cross-domain comparative transfer data in the form of a fixed-length feature vector, clearly expressing the differences in cross-domain behavior in the vector space. This effectively improves the organizeability, indexability, and transferability of training samples in cross-domain environments, laying a solid structural foundation for the feature encoding network to extract stable representations.

[0036] Specifically, the steps for constructing a feature encoding network based on fixed-length feature vectors and outputting low-dimensional contrastive representations, and quantitatively evaluating the representation stability, cross-domain stability, and cross-domain complexity of training samples, generating cross-domain robust quantization values ​​based on the quantization evaluation results, and forming a training sample transfer sequence are as follows: For fixed-length feature vectors in the training sample sequence, a multilayer perceptron neural network algorithm is used to construct a feature encoding network. The fixed-length feature vectors are used as inputs and sequentially fed into the neurons of each hidden layer to perform linear transformations and nonlinear activation operations, obtaining the corresponding low-dimensional contrastive representations. When constructing the feature encoding network, the number of layers, neurons, and activation functions of the input layer, hidden layer, and output layer are kept fixed, so that the fixed-length feature vectors can be compressed from the high-dimensional business behavior space to the contrastive representation space during the layer-by-layer mapping process. In the linear transformation stage, the fixed-length feature vectors are weighted and combined based on the weight matrix. In the nonlinear activation stage, the activation function is used to strengthen the cross-domain behavior differences, enabling the encoding network to extract stable cross-domain representation structures from continuous inputs, and finally generating low-dimensional contrastive representations with compression characteristics in the output layer. A correspondence table is established for sample indexes, fixed-length feature vectors, low-dimensional comparative representations, and scene domain numbers, all using the same index number. When establishing this table, it is ensured that the terminal interaction duration, single session dwell time, number of clicks, request response time, terminal network downlink speed, adjacent event interval, event sequence length, platform entry request count, cross-platform forwarding latency, single operation value, operation value within a period, number of interface call exceptions, log write rate, and scene domain number for each sample record are mutually searchable with their corresponding low-dimensional comparative representations. This ensures a consistent index structure for subsequent cross-domain difference analysis and adjacent sample retrieval. For each training sample, the set of adjacent samples with the same scene domain number is retrieved from the correspondence table. The average similarity between the low-dimensional comparative representation of the current training sample and the low-dimensional comparative representations of all adjacent samples is calculated to obtain the representation stability evaluation value for the current training sample. During the retrieval process, samples with the same scene domain number are clustered in a common neighborhood structure based on the vector distance of their low-dimensional comparative representations, ensuring that representation stability reflects the degree of behavioral consistency within the same scene domain.For each fixed-length feature vector, the corresponding terminal interaction duration, single session dwell time, number of clicks, event sequence length, number of API call anomalies, number of platform entry requests, number of operations within a period, and number of single operations are extracted and comprehensively calculated to obtain the cross-domain stability assessment value. The corresponding adjacent event interval time, request response time, terminal network downlink speed, cross-platform forwarding latency, and log writing speed are extracted and comprehensively calculated to obtain the cross-domain complexity assessment value. When calculating the cross-domain stability assessment value, the terminal interaction duration, single session dwell time, number of clicks, event sequence length, number of API call anomalies, number of platform entry requests, number of operations within a period, and number of single operations are used to construct a stability quantification structure. When calculating the cross-domain complexity assessment value, the adjacent event interval time, request response time, terminal network downlink speed, cross-platform forwarding latency, and log writing speed are used to construct a complexity quantification structure. This ensures that the cross-domain stability assessment value and the cross-domain complexity assessment value can accurately reflect the performance level and difficulty level of the sample in cross-domain migration. The cross-domain stability evaluation value is multiplied by the characterization stability evaluation value and then divided by the corresponding cross-domain complexity evaluation value to obtain the cross-domain robust quantization value of the current migration in the corresponding scene domain. The training samples are then sorted from largest to smallest according to the cross-domain robust quantization value to obtain the training sample migration sequence. During the sorting process, the cross-domain robust quantization value of each training sample is kept in correspondence with its fixed-length feature vector, low-dimensional contrastive representation, scene domain number, and sample index, so that the migration sequence can reflect the global ranking structure of cross-domain behavior performance and provide an executable priority basis for the subsequent training process.

[0037] In this embodiment, Table 1 is a cross-domain robust quantization value data table, which lists the core quantization indicators of five sets of scenario domain samples in the cross-domain migration modeling process, including cross-domain stability evaluation value, characterization stability evaluation value, cross-domain complexity evaluation value, and cross-domain robust quantization value obtained according to the cross-domain robust quantization value calculation formula. The specific descriptions are as follows: Scenario D1: Cross-domain stability evaluation value is 2.31, representation stability evaluation value is 0.87, cross-domain complexity evaluation value is 1.52, and cross-domain robust quantization value is 1.32; Scenario D2: Cross-domain stability evaluation value is 1.95, representation stability evaluation value is 0.78, cross-domain complexity evaluation value is 1.36, and cross-domain robust quantization value is 1.12; Scenario D3: Cross-domain stability evaluation value is 2.68, representation stability evaluation value is 0.91, cross-domain complexity evaluation value is 1.74, and cross-domain robust quantization value is 1.40; Scenario D4: Cross-domain stability evaluation value is 1.72, representation stability evaluation value is 0.65, cross-domain complexity evaluation value is 1.21, and cross-domain robust quantization value is 0.92; Scenario D5: Cross-domain stability evaluation value is 3.05, representation stability evaluation value is 0.94, cross-domain complexity evaluation value is 2.03, and cross-domain robust quantization value is 1.41.

[0038] Table 1 Cross-Domain Robust Quantitative Value Data Table

[0039]

[0040] like Figure 3 As shown, the graph illustrates the changes in cross-domain stability assessment values, representation stability assessment values, and cross-domain complexity assessment values ​​from five scenario domains, illustrating the differences in migration characteristics across these three key assessment metrics across different scenario domains. The blue line in the graph corresponds to the cross-domain stability assessment value, reflecting the stability performance within a scenario domain under multiple factors such as page visits, session dwell time, click behavior, sequence structure, anomaly suppression, and the amount of operands. The yellow line corresponds to the representation stability assessment value, reflecting the degree of aggregation of low-dimensional contrastive representations output by the feature encoding network among samples within the same domain, used to measure the intra-domain consistency of the encoded representation. The green line corresponds to the cross-domain complexity assessment value, reflecting the complexity level affected by factors such as event intervals, transmission latency, network bandwidth, and log write perturbations during cross-domain migration. Through the joint visualization of these three assessment values, the differences in stability and complexity between different scenario domains can be intuitively observed, providing a foundation for subsequent calculation of cross-domain robustness quantification values.

[0041] like Figure 4 As shown, the cross-domain robustness quantification values ​​obtained after cross-domain transfer modeling evaluation for five scene domains are presented. The cross-domain robustness quantification values ​​are calculated jointly based on the cross-domain stability evaluation value, representation stability evaluation value, and cross-domain complexity evaluation value proposed in this invention. Cross-domain stability and representation stability serve as positive driving factors, while cross-domain complexity serves as a negative inhibiting factor. These three factors together determine the transferability and robustness of scene domain samples during cross-domain transfer. The value marked on each bar in the figure is the actual cross-domain robustness quantification value of the corresponding scene domain sample, which can intuitively reflect the differences in cross-domain transfer adaptability among different scene domains. Figure 4 This provides a basis for decision-making in the subsequent construction of a robust sample set, screening of candidate samples, and execution of cross-domain comparative transfer optimization training. It is an important reference result for sample screening and training scheduling in the method of this invention.

[0042] In this implementation scheme, fixed-length feature vectors are encoded layer by layer to generate low-dimensional contrastive representations. This compresses cross-domain contrastive transfer data into a structurally stable representation space before it enters the transfer modeling stage, reducing noise interference and scale differences between high-dimensional features. By establishing a correspondence between sample indices, fixed-length feature vectors, low-dimensional contrastive representations, and scene domain numbers, cross-domain behavioral data becomes searchable and associative at the representation layer. Quantitative evaluation of representation stability, cross-domain stability, and cross-domain complexity further enables each sample to be assigned comparable robustness indicators in a cross-domain environment. The formation of cross-domain robustness quantification values ​​allows cross-domain differences to be explicitly ranked. Through the above processing, the training sample transfer sequences exhibit clear structure, well-defined differences, and stable representations, providing a reliable input foundation for subsequent priority-based contrastive learning and transfer optimization, thereby improving the overall discriminative ability and usability of cross-domain transfer modeling.

[0043] Specifically, the steps for extracting the corresponding terminal interaction duration, single session dwell time, number of clicks, event sequence length, number of API call anomalies, number of platform entry requests, and the numerical values ​​of operations within a period and single operations, and then comprehensively calculating the cross-domain stability assessment value, are as follows: The terminal interaction duration, single session dwell time, and number of clicks are summed, then one is added and the natural logarithm is taken. In the summation stage, the cumulative expression of cross-domain behavior frequency is strengthened. In the natural logarithm taking stage, the nonlinear compression characteristic of the logarithmic function is used to suppress the offset caused by abnormally large values, resulting in a basic activity term that can characterize the strength of cross-domain behavior activity. The event sequence length is divided by the sum of the event sequence length and a constant, then squared. In the normalization stage, the relative proportion of the event sequence structure among different samples is maintained. In the squaring stage, the nonlinear amplification capability of the power function is used to highlight the stability differences caused by changes in sequence length, resulting in a sequence structure term that reflects the coherence of cross-domain behavior. The number of API call anomalies is divided by the sum of the logarithmic function and the natural logarithm. The sum of the number of entry requests and a constant is used as the exponent, with the negative of the resulting ratio as the exponent and the natural constant e as the base. During the ratio normalization phase, the measurement scale of the anomaly ratio is controlled. During the negative exponent phase, an exponential function is used to sensitively respond to changes in the anomaly ratio, resulting in a smaller exponent as the anomaly ratio increases, thus achieving a stability suppression effect on anomalous behavior and obtaining an anomaly suppression term that measures the strength of anomalous disturbances. The number of operations within a period is divided by the sum of the number of operations within the period and the number of single operations. During the value normalization phase, the operation structure is balanced, unaffected by sudden increases in single operations, allowing fluctuations in cross-domain operational behavior to be expressed on a consistent scale, resulting in an operation ratio mapping term that describes the continuity of operational behavior. The basic activity term, sequence structure term, anomaly suppression term, and operation ratio mapping term are multiplied sequentially. During the continuous product phase, the multidimensional characteristics of activity, structure, anomaly suppression, and operational behavior continuity are integrated, providing a unified representation of the stability of cross-domain behavior and obtaining a cross-domain stability assessment value.

[0044] The specific formula for calculating the cross-domain stability assessment value is as follows:

[0045] ;

[0046] In the formula, This represents the cross-domain stability assessment value. Indicates the duration of terminal interaction. Indicates the duration of a single session. Indicates the number of clicks. Indicates the length of the event sequence. This indicates the number of times the API call failed. This indicates the number of requests made to the platform's entry point. Indicates the numerical value of operations within the period. This indicates the numerical value of a single operation.

[0047] In this implementation scheme, by introducing multiple normalization operators and nonlinear functions during the construction of the cross-domain stability assessment, different cross-domain behavioral characteristics are continuously expressed at a unified scale. Furthermore, through the combined effects of exponential operations, logarithmic transformations, and squared enhancements, a differentiated reinforcement mechanism is formed for active fluctuations, structural coherence, anomalous perturbations, and operational changes. This mechanism establishes a robust correlation mapping between multiple types of cross-domain behavioral characteristics, enabling the cross-domain stability assessment values ​​to maintain numerical smoothness and discriminative sensitivity when facing noisy samples, skewed distribution samples, and samples with drastic behavioral fluctuations. This significantly improves the quantification accuracy and assessment reliability of cross-domain behavioral stability, providing a more discriminative foundation for the subsequent construction of robust cross-domain quantification values.

[0048] Specifically, the steps for extracting the corresponding adjacent event interval, request-response time, terminal network downlink rate, cross-platform forwarding latency, and log write rate, and comprehensively calculating the cross-domain complexity evaluation value are as follows: The adjacent event interval is incremented by one to moderately amplify the original interval in the time dimension. The linear shifting effect of the addition operation avoids numerical folding caused by zero values, resulting in a time expansion term. This time expansion term dynamically amplifies the trigger density of cross-domain behavior, thereby strengthening the identification of cross-domain frequency anomalies. The request-response time is divided by the sum of the terminal network downlink rate and a constant one. Introducing a constant one into the denominator improves the numerical stability in low-speed scenarios. Adding the cross-platform forwarding latency and the constant one further increases the overall latency in the cross-domain transmission link through additive coupling. A transmission hindrance term is obtained, which reflects the true hindrance strength of the cross-domain link after the superposition of multiple transmission time factors. Under the nonlinear cumulative effect of the additive structure, it enhances the complexity sensitivity of high-latency scenarios. The log write rate is incremented by one to achieve a linear numerical increase in the write rate and avoid the complexity collapse caused by the write rate being zero. This results in a write perturbation term, which characterizes the continuous interference caused by log generation during cross-domain processes and amplifies the write load numerically. The time extension term, transmission hindrance term, and write perturbation term are multiplied sequentially to generate a nonlinear joint amplification of multiple complexity sources in the multiplication chain. Under the coupling effect of the multiplication structure, the linkage effect between cross-domain latency, transmission hindrance, and write perturbation is strengthened, resulting in a cross-domain complexity evaluation value.

[0049] The specific formula for calculating the cross-domain complexity evaluation value is as follows:

[0050] ;

[0051] In the formula, This represents the cross-domain complexity evaluation value. Indicates the time interval between adjacent events. Indicates the request-response time. Indicates the terminal network downlink speed. Indicates cross-platform forwarding latency. This indicates the log write rate.

[0052] In this implementation scheme, by introducing time extension, transmission hindrance, and write perturbation terms into the cross-domain complexity assessment process, and sequentially fusing these three types of load characteristics within a nonlinear coupling structure, a joint metric that truly reflects cross-domain transmission pressure is formed. This unifies the differences in cross-domain behavior across the time, transmission link, and write load dimensions, mapping them to the same complexity scale. This assessment process achieves synergistic enhancements in numerical stability, sensitivity amplification, and the revelation of correlated effects, ensuring that cross-domain complexity is no longer dominated by a single factor but exhibits more comprehensive dynamic response characteristics. This provides a more discernible complexity foundation for the subsequent generation of robust cross-domain quantification values.

[0053] Specifically, the steps for constructing robust sample sets and candidate sample sets based on training sample transfer sequences, and constructing positive and negative sample pairs based on the similarity of low-dimensional contrastive representations, are as follows: Based on the training sample transfer sequences, the median of the cross-domain robust quantization value of the current batch of training samples is extracted. During the extraction process, the median is used as a segmentation threshold to reduce the impact of extreme high and low values ​​on the overall distribution. Training samples with cross-domain robust quantization values ​​not lower than the median are divided into robust sample sets, ensuring that the training samples in the robust sample sets exhibit high cross-domain robustness. Training samples with cross-domain robust quantization values ​​lower than the median are divided into candidate sample sets, ensuring that the training samples in the candidate sample sets retain potentially usable samples and avoid directly discarding samples with lower robustness. The sample indices of all training samples in both sets are recorded to enable rapid association and access to fixed-length feature vectors, low-dimensional contrastive representations, and scene domain numbers in subsequent training rounds via the sample indices. In each training round, N robust samples are extracted from the robust sample sets to construct cross-domain training batches, where N is a positive integer greater than three. During the extraction process, a random sampling strategy is used to ensure that different scene domain numbers are included in the training batch. The training batch is representatively distributed, and the corresponding low-dimensional contrastive representations and scene domain numbers are obtained from the correspondence table through the sample index. The similarity between the low-dimensional contrastive representations of any two robust samples in the training batch is calculated. During the calculation, a similarity metric method is used to numerically compare the low-dimensional contrastive representations so that the similarity can truly reflect the closeness of different robust samples in the contrastive representation space. Two sample pairs with low-dimensional contrastive representation similarity greater than the similarity threshold and the same scene domain number are marked as positive sample pairs. When marking positive sample pairs, the similarity threshold constraint ensures that the two samples have a high closeness relationship in the low-dimensional contrastive representation space. At the same time, the condition that the scene domain number is the same ensures that the sample pairs come from the same scene domain, so that the positive sample pairs can represent high similarity behavior in the same domain. Two sample pairs with low-dimensional contrastive representation similarity not greater than the similarity threshold and different scene domain numbers are marked as negative sample pairs. When marking negative sample pairs, the condition that the similarity is not greater than the similarity threshold is introduced to introduce representation differences. The condition that the scene domain number is different is introduced to introduce cross-domain differences, so that the negative sample pairs can represent low similarity behavior in different domains and provide clear positive and negative sample supervision signals for subsequent contrastive learning.

[0054] In this implementation scheme, by introducing a median segmentation strategy of cross-domain robust quantization values ​​into the training sample transfer sequence and combining it with a similarity determination method of low-dimensional contrastive representation, a clear sample contrast relationship can be established during the training process. This enables the construction of positive and negative sample pairs to have highly discriminative cross-domain characteristics, makes the sample organization within the training batch more in line with the structural requirements of cross-domain transfer modeling, and allows subsequent contrastive learning to continuously strengthen intra-domain similarity and cross-domain differences under the condition of robust sample dominance. This allows the feature encoding network to obtain more accurate supervision signals during the training process, and enables cross-domain representation learning under small sample conditions to have higher consistency and separability, thereby effectively improving the overall robustness of cross-domain transfer modeling.

[0055] Specifically, the steps for evaluating the contrastive learning loss of the current training batch are as follows: The similarity between the low-dimensional contrastive representation of the robust sample and each sample in the positive sample set is used as an exponent. An exponential operation is performed with the natural constant e as the base. During the exponential operation, the monotonically enhancing property of the exponential function is used to strengthen and amplify highly similar samples, giving higher weights to positive samples that are closer to the robust samples. Then, all exponential operation results are summed to obtain a positive sample aggregation term, which reflects the degree of concentration of the positive sample set around the robust sample in the low-dimensional contrastive representation space. The low-dimensional contrastive representation of the robust sample... The similarity between each sample in the positive and negative sample sets is used as an exponent, and exponential operations are performed with the natural constant e as the base. During the exponential operation, the sensitivity of the exponential function to different similarity levels is utilized to amplify subtle differences between samples, enabling the positive and negative sample sets to form distinguishable statistical distributions in the numerical space. Then, all exponential operation results are summed to obtain the global comparison term, which comprehensively describes the comparison relationship between robust samples and all samples within the training batch, thus constructing a complete normalized reference benchmark. The ratio of the positive sample aggregation term to the global comparison term is taken as the negative natural logarithm. By leveraging the contraction properties of the logarithmic function to suppress gradient instability caused by excessively large ratios, and using a negative sign to map the similarity structure into a loss form, a larger ratio indicates that the positive sample is closer to the robust sample, thus resulting in a smaller loss. This logarithmic result is then multiplied by the corresponding cross-domain robustness quantization value and representation stability evaluation value. During this multiplication process, the cross-domain robustness quantization value is introduced to adjust the intensity of the overall cross-domain performance of the training samples, and the representation stability evaluation value is introduced to weight the stability of the samples in the low-dimensional contrastive representation space with credibility. This yields a single-sample contrastive loss term that simultaneously reflects cross-domain robustness and representation stability. The overall impact on training intensity is as follows: The cumulative loss value of the training batch is formed by summing all single-sample contrastive loss terms. Then, it is divided by the sum of the cross-domain robust quantization value and the representation stability evaluation value of all samples in the robust sample set plus one. In this normalization process, the balance correction of the differences between samples within the training batch is achieved by the joint accumulation of the cross-domain robust quantization value and the representation stability evaluation value. By adding one, the calculation abnormality caused by the denominator being zero is avoided. Thus, the contrastive learning loss evaluation value is obtained, making the final loss result more stable, controllable and insensitive to the batch sample distribution, providing a reliable gradient basis for subsequent parameter updates.

[0056] The specific formula for calculating the contrastive learning loss evaluation value is as follows:

[0057] ;

[0058] In the formula, This represents the comparative learning loss evaluation value. This represents the set of sample indices for the current training batch. Let i represent the set of positive samples of robust sample i. Let i represent the set of negative samples of robust sample i. This represents the low-dimensional contrastive similarity between robust samples i and j. This represents the cross-domain robust quantization value of robust sample i. This represents the evaluation value of the characterization stability of robust sample i.

[0059] In this implementation scheme, the distribution difference of low-dimensional contrastive representations in the positive sample set and the entire sample set is enhanced by a combination of exponential and logarithmic operations. This allows the contrastive relationship to form a clear and separable gradient structure in the numerical space. Furthermore, during the loss function construction process, cross-domain robust quantization values ​​and representation stability evaluation values ​​are introduced as adaptive weighting factors, enabling the loss function to dynamically adjust for cross-domain transfer characteristics. Through this structured design, the training process can strengthen the dominant role of high-quality samples in parameter updates while maintaining numerical stability, making the obtained gradients more reliable, the training convergence behavior more stable, and effectively improving the targeted convergence efficiency of cross-domain sample transfer modeling.

[0060] Specifically, the steps for updating the feature encoding network parameters based on the loss results are as follows: Based on the contrastive learning loss evaluation value, gradient backpropagation and encoding network parameter updates are performed on the feature encoding network. During gradient backpropagation, corresponding parameter gradients are generated based on the single-sample contrastive loss term of each robust sample, and the weights of each layer are adjusted layer by layer according to the weight update method set internally by the feature encoding network, so that the relative distribution of low-dimensional contrastive representations in the feature space is gradually optimized towards positive sample aggregation and negative sample separation. After the encoding network parameters are updated, cross-domain robust quantization values ​​are selected from the candidate sample set that are not lower than the quantile threshold and have a similarity of not less than the similarity threshold with the low-dimensional contrastive representation of any robust sample. Candidate samples are merged to a threshold. During the screening process, a similarity check is performed on the low-dimensional contrastive representation of the candidate samples to ensure that the candidate samples have transferability under both cross-domain robust quantization value and low-dimensional contrastive representation indicators. The candidate samples that pass the check are merged into the robust sample set and the set index is updated. The training batch construction, loss evaluation, parameter update and sample set maintenance process is executed cyclically. During the cycle, the change of the contrastive learning loss evaluation value in consecutive training rounds is continuously monitored. When the change of the contrastive learning loss evaluation value in M ​​consecutive training rounds is lower than the loss change threshold, the training is terminated and the encoding network parameters are fixed to achieve sample transfer modeling optimization. Here, M is a positive integer greater than three.

[0061] In this implementation scheme, gradient backpropagation and parameter iterative updates are driven by the contrastive learning loss evaluation value during training. The candidate samples are dynamically screened and merged by combining cross-domain robust quantization values ​​and low-dimensional contrastive representation similarity. This achieves a continuous self-correcting training process with loss convergence trend as the core. The feature encoding network can gradually strengthen the positive sample aggregation ability and negative sample discrimination ability under stable and controllable conditions. The composition of the robust sample set is optimized step by step with the training process. This ensures that transfer modeling can still obtain continuously improved cross-domain representation quality and convergence reliability under small sample conditions, and provides more generalizable encoding results for subsequent cross-domain transfer inference.

[0062] like Figure 2 As shown, the second aspect of this invention provides a small-sample transfer modeling optimization system based on contrastive learning, comprising: a data acquisition and preprocessing module, a sample index feature generation module, a representation robustness score calculation module, and a cross-domain robustness quantization evaluation module, wherein: the data acquisition and preprocessing module is used to periodically acquire cross-domain contrastive transfer data and perform time alignment, noise suppression, anomaly removal, missing data completion, and scale normalization on the acquired data to generate preprocessed cross-domain contrastive transfer data; the sample index feature generation module is used to construct a fixed-length feature vector set based on the preprocessed cross-domain contrastive transfer data according to event timestamps and generate a training sample sequence; the representation robustness score calculation module is used to construct a feature encoding network based on the fixed-length feature vectors and output a low-dimensional contrastive representation, quantify the representation stability, cross-domain stability, and cross-domain complexity of the training samples, generate a cross-domain robustness quantization value based on the quantization evaluation result, and form a training sample transfer sequence; the cross-domain robustness quantization evaluation module is used to construct a robust sample set and a candidate sample set based on the training sample transfer sequence, construct positive and negative sample pairs based on the similarity of the low-dimensional contrastive representations, evaluate the contrastive learning loss of the current training batch, and update the feature encoding network parameters based on the loss result.

[0063] In this implementation scheme, cross-domain comparative transfer data collected periodically is processed in multiple stages, and a fixed-length feature vector with a unified structure is constructed based on event timestamps. Then, a transfer sequence is established by combining the robust quantization mechanism of low-dimensional comparative representation with the sorting structure of cross-domain robust quantization values. In the process of comparative learning training, the robust sample set is dynamically updated, forming a complete closed loop from data collection to transfer modeling. This allows the generation, evaluation and optimization of cross-domain representations to be in a continuous iterative state, enabling the feature encoding network to achieve higher cross-domain robustness under small sample conditions. It also enables transfer modeling to exhibit stronger generalization ability when facing differences in scene domains, providing systematic technical support for improving the accuracy and stability of cross-domain transfer tasks.

[0064] It should be noted that, in this document, relational terms such as "first" and "second" are used only to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such process, method, article, or apparatus.

[0065] The preferred embodiments of the present invention disclosed above are merely illustrative of the invention. These preferred embodiments do not exhaustively describe all details, nor do they limit the invention to the specific implementations described. Clearly, many modifications and variations can be made based on the content of this specification. This specification selects and specifically describes these embodiments to better explain the principles and practical applications of the invention, thereby enabling those skilled in the art to better understand and utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims

1. A few-shot transfer learning modeling optimization method based on contrastive learning, characterized in that, Includes the following steps: S1 periodically collects cross-domain comparison and migration data, and performs time alignment, noise suppression, anomaly removal, missing data completion and scale standardization on the collected data to generate preprocessed cross-domain comparison and migration data; S2, based on the preprocessed cross-domain comparative transfer data, construct a fixed-length feature vector set according to the event timestamp, and generate a training sample sequence; S3 constructs a feature encoding network based on fixed-length feature vectors and outputs low-dimensional contrastive representations. It quantitatively evaluates the representation stability, cross-domain stability, and cross-domain complexity of training samples, generates cross-domain robust quantization values ​​based on the quantization evaluation results, and forms a training sample transfer sequence. The specific steps for constructing a feature encoding network based on fixed-length feature vectors and outputting low-dimensional contrastive representations, quantitatively evaluating the representation stability, cross-domain stability, and cross-domain complexity of training samples, generating cross-domain robust quantization values ​​based on the quantization evaluation results, and forming training sample transfer sequences are as follows: For fixed-length feature vectors in the training sample sequence, a feature encoding network is constructed using a multilayer perceptron neural network algorithm. The fixed-length feature vectors are used as inputs and sequentially fed into the neurons of each hidden layer to perform linear transformations and nonlinear activation operations to obtain the corresponding low-dimensional contrastive representations. A correspondence table is established between the sample index, fixed-length feature vectors, low-dimensional contrastive representations, and scene domain numbers according to the same index number. For each training sample, retrieve the set of neighboring samples with the same scene domain number in the correspondence table, calculate the average similarity between the low-dimensional contrastive representation of the current training sample and the low-dimensional contrastive representation of all neighboring samples, and obtain the representation stability evaluation value of the current training sample. For each fixed-length feature vector, extract the corresponding terminal interaction duration, single session dwell time, number of clicks, event sequence length, number of API call exceptions, number of platform entry requests, operation value within the period, and single operation value, and calculate the cross-domain stability evaluation value. Extract the corresponding adjacent event interval time, request response time, terminal network downlink rate, cross-platform forwarding latency, and log writing rate, and calculate the cross-domain complexity evaluation value. Multiply the cross-domain stability evaluation value by the stability evaluation value, and then divide it by the corresponding cross-domain complexity evaluation value to obtain the cross-domain robustness quantization value of the current migration in the corresponding scenario domain. Sort the training samples according to the cross-domain robustness quantization value from largest to smallest to obtain the training sample migration sequence. S4. Based on the training sample transfer sequence, construct a robust sample set and a candidate sample set, and construct positive and negative sample pairs based on the low-dimensional contrastive representation similarity. Evaluate the contrastive learning loss of the current training batch, and update the feature encoding network parameters based on the loss results. The specific steps for periodically collecting cross-domain comparison and migration data, and performing time alignment, noise suppression, anomaly removal, missing data completion, and scale standardization on the collected data to generate preprocessed cross-domain comparison and migration data are as follows: A fixed-width sliding time window is set as a sampling period to periodically collect cross-domain comparison migration data. The cross-domain comparison migration data includes terminal interaction duration, single session dwell time, number of clicks, request response time, terminal network downlink speed, event timestamp, interval between adjacent events, event sequence length, number of platform entry requests, cross-platform forwarding latency, single operation value, operation value within the period, number of interface call exceptions, log writing rate, and scenario domain number. For the collected cross-domain comparative migration data, the timestamps of different data sources are uniformly corrected based on the multi-source timestamp synchronization correction mechanism, and the sudden fluctuations and measurement noise in the cross-domain comparative migration data are suppressed and smoothed by the moving average filtering algorithm. Anomaly detection method based on the local outlier algorithm is used to identify and remove abnormal sampling points, and the missing data is completed by the K-nearest neighbor interpolation algorithm. The cross-domain comparative migration data is numerically standardized by the Z-score standardization algorithm to unify the dimensional scale of different physical quantities.

2. The few-shot transfer learning modeling optimization method based on contrastive learning according to claim 1, characterized in that: The specific steps for constructing a fixed-length feature vector set based on event timestamps from the preprocessed cross-domain comparative transfer data and generating a training sample sequence are as follows: Extract the preprocessed cross-domain comparison migration data, sort all cross-domain comparison migration data according to event timestamps, assign a unique sample index to each group of cross-domain comparison migration data, concatenate the cross-domain comparison migration data under the same sample index into a fixed-length feature vector according to a fixed field order, and append the corresponding scene domain number as an independent feature dimension to the fixed-length feature vector to form a training sample sequence containing the sample index, fixed-length feature vector and scene domain number.

3. The few-shot transfer learning modeling optimization method based on contrastive learning according to claim 1, characterized in that: The specific steps for extracting the corresponding terminal interaction duration, single session dwell time, number of clicks, event sequence length, number of API call exceptions, number of platform entry requests, and the numerical values ​​of operations within a period and single operations, and then comprehensively calculating the cross-domain stability assessment value, are as follows: The basic activity level is obtained by summing the terminal interaction duration, single session dwell time, and number of clicks, adding one, and taking the natural logarithm. The sequence structure is obtained by dividing the event sequence length by the sum of the event sequence length and a constant one. Divide the number of API call exceptions by the sum of the number of platform entry requests and a constant, use the negative of the ratio as the exponent, and perform exponential operations with the natural constant e as the base to obtain the exception suppression term; Divide the total number of operations within a period by the sum of the total number of operations within a period and the total number of operations in a single operation to obtain the operation ratio mapping term. The cross-domain stability evaluation value is obtained by multiplying the basic activity term, sequence structure term, anomaly suppression term, and operation ratio mapping term in sequence.

4. The few-shot transfer learning modeling optimization method based on contrastive learning according to claim 1, characterized in that: The specific steps for extracting the corresponding adjacent event interval, request response time, terminal network downlink rate, cross-platform forwarding latency, and log writing rate, and comprehensively calculating the cross-domain complexity evaluation value are as follows: Add one to the interval between adjacent events to obtain the time extension term; divide the request response time by the sum of the terminal network downlink rate and a constant, and add the cross-platform forwarding delay and a constant to obtain the transmission stall term; add one to the log write rate to obtain the write disturbance term; multiply the time extension term, transmission stall term, and write disturbance term in sequence to obtain the cross-domain complexity evaluation value.

5. The few-shot transfer learning modeling optimization method based on contrastive learning according to claim 1, characterized in that: The specific steps for constructing a robust sample set and a candidate sample set based on the training sample transfer sequence, and constructing positive and negative sample pairs based on the low-dimensional contrastive representation similarity are as follows: Based on the training sample migration sequence, the median of the cross-domain robust quantization value of the current batch of training samples is extracted. Training samples with cross-domain robust quantization values ​​not lower than the median are divided into a robust sample set, and training samples with cross-domain robust quantization values ​​lower than the median are divided into a candidate sample set. The sample index of all training samples in the two sets is recorded respectively. In each training round, N robust samples are extracted from the robust sample set to construct a cross-domain training batch, and the corresponding low-dimensional contrastive representation and scene domain number are obtained from the correspondence table through the sample index. The similarity between the low-dimensional contrastive representations of any two robust samples in the training batch is calculated. Two sample pairs with the same scene domain number and a low-dimensional contrastive representation similarity greater than the similarity threshold are denoted as positive sample pairs, while two sample pairs with the same low-dimensional contrastive representation similarity not greater than the similarity threshold and different scene domain numbers are denoted as negative sample pairs.

6. The few-shot transfer learning modeling optimization method based on contrastive learning according to claim 1, characterized in that: The specific steps for evaluating the contrastive learning loss of the current training batch are as follows: The positive sample aggregation term is obtained by using the similarity between the low-dimensional contrastive representation of the robust sample and each sample in the positive sample set as an exponent, with the natural constant e as the base, and then summing all the exponent calculation results. The global set contrastive term is obtained by taking the negative natural logarithm of the ratio of the positive sample aggregation term to the global set contrastive term and multiplying it by the corresponding cross-domain robust quantization value and representation stability evaluation value. The single sample contrastive loss term is obtained by summing all the single sample contrastive loss terms and then dividing by the sum of the products of the cross-domain robust quantization values ​​and representation stability evaluation values ​​of all samples in the robust sample set plus one.

7. The few-shot transfer learning modeling optimization method based on contrastive learning according to claim 1, characterized in that: The specific steps for updating the feature encoding network parameters based on the loss result are as follows: Based on the contrastive learning loss evaluation value, gradient backpropagation and encoding network parameter update are performed on the feature encoding network. After the encoding network parameter update is completed, candidate samples with cross-domain robust quantization values ​​not lower than the quantile threshold and low-dimensional contrastive representation similarity with any robust sample not lower than the similarity merging threshold are selected from the candidate sample set and merged into the robust sample set and the set index is updated. The training batch construction, loss evaluation, parameter update and sample set maintenance process is executed cyclically. When the change of the contrastive learning loss evaluation value in M ​​consecutive training rounds is lower than the loss change threshold, the training is terminated and the encoding network parameters are fixed to achieve sample transfer modeling optimization.

8. A few-shot transfer learning modeling optimization system based on contrastive learning, characterized in that: include: The module comprises a data acquisition and preprocessing module, a sample index feature generation module, a characterization robustness score calculation module, and a cross-domain robust quantitative evaluation module, among which: The data acquisition and preprocessing module is used to periodically acquire cross-domain comparison migration data and perform time alignment, noise suppression, anomaly removal, missing data completion and scale standardization on the acquired data to generate preprocessed cross-domain comparison migration data. The sample index feature generation module is used to construct a fixed-length feature vector set based on the preprocessed cross-domain comparison migration data according to the event timestamp, and generate a training sample sequence. The representation robustness score calculation module is used to construct a feature encoding network based on a fixed-length feature vector and output a low-dimensional contrastive representation, to quantitatively evaluate the representation stability, cross-domain stability and cross-domain complexity of the training samples, to generate cross-domain robust quantization values ​​based on the quantization evaluation results, and to form a training sample transfer sequence. The specific steps for constructing a feature encoding network based on fixed-length feature vectors and outputting low-dimensional contrastive representations, quantitatively evaluating the representation stability, cross-domain stability, and cross-domain complexity of training samples, generating cross-domain robust quantization values ​​based on the quantization evaluation results, and forming training sample transfer sequences are as follows: For fixed-length feature vectors in the training sample sequence, a feature encoding network is constructed using a multilayer perceptron neural network algorithm. The fixed-length feature vectors are used as inputs and sequentially fed into the neurons of each hidden layer to perform linear transformations and nonlinear activation operations, resulting in the corresponding low-dimensional contrastive representations. A correspondence table is established for the sample index, fixed-length feature vectors, low-dimensional contrastive representations, and scene domain numbers according to the same index number. For each training sample, the set of adjacent samples with the same scene domain number is retrieved from the correspondence table, and the average similarity between the low-dimensional contrastive representation of the current training sample and the low-dimensional contrastive representations of all adjacent samples is calculated to obtain the representation stability evaluation value of the current training sample. For each fixed-length feature vector, extract the corresponding terminal interaction duration, single session dwell time, number of clicks, event sequence length, number of API call exceptions, number of platform entry requests, operation value within the period, and single operation value, and calculate the cross-domain stability evaluation value. Extract the corresponding adjacent event interval time, request response time, terminal network downlink rate, cross-platform forwarding latency, and log writing rate, and calculate the cross-domain complexity evaluation value. Multiply the cross-domain stability evaluation value by the stability evaluation value, and then divide it by the corresponding cross-domain complexity evaluation value to obtain the cross-domain robustness quantization value of the current migration in the corresponding scenario domain. Sort the training samples according to the cross-domain robustness quantization value from largest to smallest to obtain the training sample migration sequence. The cross-domain robust quantization evaluation module is used to construct a robust sample set and a candidate sample set based on the training sample transfer sequence, construct positive and negative sample pairs based on the low-dimensional contrastive representation similarity, evaluate the contrastive learning loss of the current training batch, and update the feature encoding network parameters based on the loss results.