A deep learning-based financial customer risk label dynamic configuration method and system
By using deep learning methods to collect and process multi-source heterogeneous data in real time, and by using convolutional long short-term memory networks and graph attention networks to extract customer transaction behavior and associated risks, and dynamically configuring risk labels, the limitations of traditional risk rating methods are overcome. This enables accurate perception and dynamic management of customer risks, and improves the accuracy and timeliness of the financial risk control system.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- DERKEE INFORMATION CO LTD
- Filing Date
- 2026-03-13
- Publication Date
- 2026-06-19
AI Technical Summary
Traditional risk rating methods struggle to handle massive, multi-source, and heterogeneous real-time data streams, fail to capture complex temporal dependency patterns in transaction behavior, and ignore the associated risks between customers. This results in delayed risk identification and a lack of a holistic perspective. Static risk labeling systems cannot adapt to dynamic market changes, impacting financial institutions' risk warning and dynamic management capabilities.
By employing deep learning methods and collecting multi-source heterogeneous data in real time, convolutional long short-term memory networks and graph attention networks are used to extract the temporal dependency patterns and associated risk transmission paths of customer transaction behavior. Combined with an adaptive risk label mapping module, customer risk labels are dynamically configured to achieve accurate perception of individual customer behavior and group transmission risks.
It improves the accuracy and timeliness of the financial risk control system, enabling precise perception of individual customer behavior and group transmission risks, and realizing dynamic updates and adaptive adjustments of risk labels, thereby improving the accuracy and response speed of risk identification.
Smart Images

Figure CN122243625A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of financial technology, specifically a method and system for dynamic configuration of financial customer risk labels based on deep learning. Background Technology
[0002] In the field of financial risk management, accurate assessment of customer risk levels is a core element of credit approval, credit limit management, and compliance monitoring. Traditional risk rating methods primarily rely on static financial indicators and credit scoring models, classifying customers through pre-defined rules or shallow machine learning models. However, these methods have significant limitations: firstly, they struggle to effectively handle massive, multi-source, and heterogeneous real-time data streams, failing to capture complex temporal dependencies in transaction behavior; secondly, traditional isolated assessment methods neglect the interconnected risks between customers within the financial network, such as guarantee relationships and transaction-circle transmission risks, leading to delayed risk identification and a lack of a holistic perspective. Furthermore, existing risk labeling systems are mostly statically configured, unable to adapt to changes in market dynamics and business tolerance, making it difficult to reflect the true risk status of customers in a timely manner, thus hindering financial institutions' risk warning and dynamic management capabilities. Summary of the Invention
[0003] The purpose of this invention is to provide a method and system for dynamic configuration of financial customer risk labels based on deep learning, so as to overcome the shortcomings of the existing technology, achieve accurate perception of individual customer behavior and group transmission risk, and improve the accuracy and timeliness of financial risk control system.
[0004] One embodiment of this application provides a method for dynamically configuring risk labels for financial customers based on deep learning, the method comprising: Real-time collection of multi-source heterogeneous data, including time-series data of customer transaction behavior, text data of credit records, and numerical data of external market dynamics, and preprocessing of the multi-source heterogeneous data to generate a standardized comprehensive customer feature dataset; The customer comprehensive feature dataset is input into a pre-constructed deep spatiotemporal feature extraction network for processing. The deep spatiotemporal feature extraction network includes a convolutional long short-term memory network and a graph attention network. The convolutional long short-term memory network is used to extract local temporal dependency patterns in transaction behavior data, and the graph attention network is used to mine the associated risk transmission paths between customers. Through joint processing, a risk embedding vector containing individual behavioral features and group association features is generated. The risk embedding vector is input into the adaptive risk label mapping module, which performs nonlinear transformation through a multilayer perceptron and calculates the probability distribution of each risk level. At the same time, it combines a contrastive learning loss function to enhance the feature clustering effect of similar risk samples and outputs dynamically updated customer risk labels. Based on the comparison between the customer risk label and the preset business tolerance threshold, the risk monitoring strategy parameters of the adaptive risk label mapping module are automatically adjusted, and the label change data and the adjusted strategy parameters are fed back to the deep spatiotemporal feature extraction network for incremental learning, generating an optimized deep spatiotemporal feature extraction network for dynamic configuration of financial customer risk labels in the next cycle.
[0005] Another embodiment of this application provides a deep learning-based dynamic configuration system for financial customer risk labels, the system comprising: The data acquisition module is used to collect multi-source heterogeneous data in real time, including time-series data of customer transaction behavior, text data of credit records, and numerical data of external market dynamics, and to preprocess the multi-source heterogeneous data to generate a standardized customer comprehensive feature dataset. The generation module is used to input the customer comprehensive feature dataset into a pre-built deep spatiotemporal feature extraction network for processing. The deep spatiotemporal feature extraction network includes a convolutional long short-term memory network and a graph attention network. The convolutional long short-term memory network is used to extract local temporal dependency patterns in transaction behavior data, and the graph attention network is used to mine the associated risk transmission paths between customers. Through joint processing, a risk embedding vector containing individual behavioral features and group association features is generated. The output module is used to input the risk embedding vector into the adaptive risk label mapping module, perform nonlinear transformation through a multilayer perceptron and calculate the probability distribution of each risk level, and combine the contrastive learning loss function to enhance the feature clustering effect of similar risk samples, and output dynamically updated customer risk labels. The learning module is used to compare the customer risk label with a preset business tolerance threshold, automatically adjust the risk monitoring strategy parameters of the adaptive risk label mapping module, and feed the label change data and the adjusted strategy parameters back to the deep spatiotemporal feature extraction network for incremental learning, generating an optimized deep spatiotemporal feature extraction network for dynamic configuration of financial customer risk labels in the next cycle.
[0006] Another embodiment of this application provides a storage medium storing a computer program, wherein the computer program is configured to execute the method described in any of the preceding claims when running.
[0007] Another embodiment of this application provides an electronic device including a memory and a processor, wherein the memory stores a computer program and the processor is configured to run the computer program to perform the method described in any of the preceding claims.
[0008] Compared with existing technologies, the present invention provides a method for dynamic configuration of financial customer risk labels based on deep learning, which can achieve accurate perception of individual customer behavior and group transmission risk, thereby improving the accuracy and timeliness of financial risk control systems. Attached Figure Description
[0009] Figure 1 A hardware structure block diagram of a computer terminal for a method of dynamically configuring financial customer risk tags based on deep learning, provided in an embodiment of the present invention; Figure 2 A flowchart illustrating a method for dynamically configuring risk labels for financial customers based on deep learning, provided in an embodiment of the present invention; Figure 3 A flowchart illustrating another method for dynamically configuring financial customer risk tags based on deep learning, provided in an embodiment of the present invention; Figure 4 A flowchart illustrating another method for dynamically configuring risk labels for financial customers based on deep learning, provided in an embodiment of the present invention; Figure 5 This is a schematic diagram of the structure of a deep learning-based dynamic configuration system for financial customer risk labels, provided as an embodiment of the present invention. Detailed Implementation
[0010] The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present invention, and should not be construed as limiting the present invention.
[0011] This invention first provides a method for dynamically configuring risk labels for financial customers based on deep learning. This method can be applied to electronic devices, such as computer terminals, specifically ordinary computers.
[0012] The following detailed explanation uses a computer terminal as an example. Figure 1 This is a hardware structure block diagram of a computer terminal for a method of dynamically configuring financial customer risk tags based on deep learning, provided as an embodiment of the present invention. Figure 1 As shown, the computer device includes a processor, memory, and network interface connected via a system bus, wherein the memory may include non-volatile storage media and internal memory.
[0013] See Figures 2-4 The present invention provides a method for dynamically configuring risk labels for financial customers based on deep learning, which may include the following steps: S201, collect multi-source heterogeneous data in real time, including customer transaction behavior time-series data, credit record text data and external market dynamic numerical data, and preprocess the multi-source heterogeneous data to generate a standardized customer comprehensive feature dataset. Specifically, it can collect multi-source heterogeneous data in real time, including customer transaction behavior time-series data, credit record text data, and external market dynamic numerical data, to generate original multi-source heterogeneous datasets; The core of this step is to achieve real-time synchronous collection of heterogeneous data related to financial customers by relying on multi-source data acquisition interfaces. The data is integrated based on the customer's unique identifier as the core association criterion to generate an original multi-source heterogeneous dataset that retains the original attributes and time-series characteristics, providing a complete data foundation for subsequent preprocessing. The specific implementation method is as follows: Data acquisition employs a distributed real-time acquisition architecture, configuring dedicated acquisition interfaces for the characteristics of different data sources. Acquisition frequencies are set according to data type differences. Customer transaction behavior time-series data is collected from the core transaction systems of financial institutions, configured for minute-level acquisition to ensure the capture of the real-time time-series characteristics of each transaction. Credit record text data is collected from credit reporting service platforms and financial institution loan approval systems, configured for hourly acquisition to match the update frequency of credit records. External market dynamic numerical data is collected from financial market data service interfaces, configured for second-level acquisition to accurately capture real-time fluctuations in market data. The collected multi-source heterogeneous data includes three core data types: customer transaction behavior time-series data is timestamped structured time-series data, containing core fields such as unique customer identifier, transaction timestamp, transaction amount, transaction type, transaction channel, counterparty, and account balance changes. The transaction timestamp uses a year-month-day hour:minute:second format. The high-precision millisecond format ensures accurate time-series data; the credit record text data is unstructured text data, including natural language descriptions of customer credit reports, explanations of overdue payments, credit approval opinions, and remarks on dishonest behavior, retaining the original semantic information of the text; the external market dynamic numerical data is structured time-series numerical data, including financial market indicators such as market interest rates, exchange rates, stock indices, industry prosperity indices, and bond yields, with each indicator accompanied by a collection timestamp and unit of measurement.
[0014] All data acquisition interfaces are equipped with data traceability identification functionality, labeling each piece of acquired data with its data source, acquisition time, and data type code. Simultaneously, using the customer's unique identifier as the core association field, preliminary associations are established between customer-related data from different data sources and of different types, ensuring that the three types of data from the same customer can be traced and matched through the unique identifier. After acquisition, the data undergoes preliminary format validation, removing data with obvious format errors while retaining the original format and attribute characteristics of each data point. The data is then structurally integrated according to the customer's unique identifier and data acquisition timestamp, generating an original multi-source heterogeneous dataset. This dataset undergoes no feature transformation, fully preserving the heterogeneity, temporal sequence, and original business characteristics of the multi-source data, serving as the raw input for subsequent data preprocessing.
[0015] Based on the original multi-source heterogeneous dataset, sliding window slicing and linear interpolation are used to fill missing values in the time series data of customer transaction behavior to generate a standardized transaction behavior time series matrix; BERT model encoding is used to encode credit record text to obtain fixed-length semantic vectors to generate credit record text embedding features; normalization and first-order differencing are performed on external market numerical data to generate market dynamic feature sequences. The core of this step is to perform differentiated preprocessing based on the characteristics of the three types of heterogeneous data, transforming the non-standardized raw data into a structured, computable feature form, and generating three types of standardized features: transaction, text, and market. The specific implementation method is as follows: For customer transaction time-series data, a sliding window slicing process is first performed. The window size is set to T=30, meaning each window contains 30 consecutive time steps, and the step size is set to 5, meaning the window slides 5 time steps each time. This parameter setting can cover the complete transaction time-series features while avoiding window redundancy. Slicing is performed in order of transaction timestamp, dividing each customer's transaction data into consecutive windows. For missing values within the window, linear interpolation is used to fill in the missing values. The formula for linear interpolation is X_inter=X_prev+(X_next-X_prev)×(t_inter-t_prev) / (t_next-t_prev) xt-t_prev), where X_inter is the interpolation result of the missing value, X_prev and X_next are the valid transaction feature values adjacent to the missing value, t_inter is the timestamp of the missing value, and t_prev and t_next are the timestamps of the adjacent valid data. After imputation, the transaction features in each window are arranged in chronological order. Each window generates a feature matrix with dimension T×F, where F is the number of dimensions of the transaction feature. In the example, F=8, which includes 8 core transaction features such as transaction amount, number of transactions, and changes in account balance. After the feature matrices of all windows are normalized according to the same dimension, a standardized transaction behavior time series matrix is generated.
[0016] For credit record text data, text preprocessing is first performed, including removing stop words, punctuation marks, and special characters from the text, and completing text case normalization and semantic normalization. Then, the preprocessed plain text is input into the pre-trained BERT model, and the [CLS] position feature vector output by the model is selected as the core semantic representation of the text. This vector is a fixed-length numerical vector with a length of 768. This length can fully preserve the core semantic information of the text while taking into account computational efficiency. Each customer's credit record text is encoded to obtain a unique 768-dimensional semantic vector, which is the credit record text embedding feature, realizing the transformation from unstructured text to structured numerical features.
[0017] For external market dynamic numerical data, firstly, minimum-maximum normalization is performed to map market indicators with different dimensions and numerical ranges to the [0,1] interval, eliminating the impact of dimensional differences on subsequent calculations. The normalization calculation formula is X_norm=(X-X_min) / (X_max-X_min), where X_norm is the normalized feature value, X is the original value of the market indicator, and X_min and X_max are the minimum and maximum values of the indicator within the collection time range. Subsequently, the normalized values are subjected to first-order differencing, calculated as ΔX_t=X_t-X_t-1, where ΔX_t is the differencing result at time step t, and X_t and X_t-1 are the normalized values at time steps t and t-1, respectively. First-order differencing can effectively capture the temporal change trend and fluctuation characteristics of market indicators. The processed differencing results are arranged in timestamp order, generating a market dynamic feature sequence consistent with the time step length of the trading behavior time series matrix.
[0018] Based on the transaction behavior time series matrix, credit record text embedding features and market dynamic feature sequences, a multi-source data timestamp alignment algorithm is used to unify all data to the same time granularity, generating a time-aligned multimodal feature set; The core of this step is to eliminate the time granularity differences of the three types of features through a multi-source data timestamp alignment algorithm, so as to achieve accurate matching of multimodal features under the same customer and the same time dimension, and generate a time-aligned multimodal feature set. The specific implementation method is as follows: First, a unified time granularity is determined. Based on the business needs of financial customer risk monitoring, the time granularity is set to 1 hour. This means that all three types of features are standardized using a 1-hour time unit to ensure consistency across multiple data sources in terms of time. The core of the multi-source data timestamp alignment algorithm is to accurately map the original timestamps of the three types of features to a unified 1-hour time granularity grid. The mapping rule is based on the time interval to which the timestamp belongs; that is, if an original timestamp falls within a certain 1-hour time interval, the data belongs to that time interval.
[0019] For the transaction behavior time series matrix, the feature values within the window are aggregated at a uniform 1-hour time granularity. For multiple transaction feature values within the same 1-hour time interval, the arithmetic mean is taken as the transaction feature value for that time interval, ensuring that each time interval corresponds to a unique transaction feature value. After aggregation, the time dimension of the transaction behavior time series matrix is converted to a uniform 1-hour granularity. For credit record text embedding features, if the original timestamp of a credit record text falls within a certain 1-hour time interval, the 768-dimensional semantic vector of the text is directly assigned to that time interval. If multiple credit record text embedding features exist within the same 1-hour time interval, they are fused by arithmetic mean to generate a unique text embedding feature for that time interval. For the market dynamic feature sequence, it is resampled at a uniform 1-hour time granularity. For missing values that appear after resampling, linear interpolation is used to fill in the missing values, ensuring that each 1-hour time interval has a corresponding market dynamic feature value.
[0020] During the timestamp alignment process, the unique customer identifier and the unified 1-hour timestamp are used as the joint primary key to accurately associate the three types of standardized features. This ensures that the transaction behavior time sequence features, credit record text embedding features, and market dynamic features within the same customer and the same 1-hour time interval correspond one-to-one, without mismatches across customers or time intervals. At the same time, empty time intervals without corresponding features are eliminated. Finally, a multimodal feature set with time alignment is generated, which is arranged in an orderly manner with a unified 1-hour time granularity and is based on the customer. This set integrates the three types of heterogeneous features and has a unified time dimension feature, realizing the time dimension standardization of multi-source heterogeneous data.
[0021] The time-aligned multimodal feature set is spliced together, and principal component analysis is applied for dimensionality reduction and redundancy removal to finally generate a standardized comprehensive customer feature dataset.
[0022] The core of this step is to fuse and stitch together time-aligned multimodal features, reduce the dimensionality and remove redundancy of high-dimensional features through principal component analysis, and then perform standardization to generate a comprehensive customer feature dataset that can be directly input into deep networks. The specific implementation method is as follows: First, feature concatenation processing is carried out. For the three types of features within the same customer and the same 1-hour time interval in the time-aligned multimodal feature set, horizontal concatenation is performed according to the feature dimensions. That is, the feature vector of the transaction behavior time series matrix, the 768-dimensional vector of the credit record text embedding feature, and the feature vector of the market dynamic feature sequence are connected sequentially according to the dimension order to generate a high-dimensional fused feature vector. During the concatenation process, a unique identifier is added to each feature dimension to ensure that different types of feature dimensions do not overlap or become confused. The concatenated fused feature vector retains all the core information of the three types of features, realizing the initial fusion of multimodal features.
[0023] Principal component analysis (PCA) is applied to the concatenated high-dimensional fused feature vectors for dimensionality reduction and redundancy removal. First, the covariance matrix of the fused feature vectors is calculated, and the eigenvalues and corresponding eigenvectors of the covariance matrix are solved. The eigenvalues are sorted in descending order, and the magnitude of the eigenvalue represents the variance contribution of the corresponding principal component. The top K principal components with a cumulative variance contribution rate of 95% are selected. A cumulative contribution rate of 95% ensures that the core information of the original features is preserved while eliminating redundant and noise information. K is the feature dimension after dimensionality reduction. In the example, K=128, that is, the top 128 principal components are selected to form a new feature space. The high-dimensional fused feature vector is projected into this feature space to obtain a 128-dimensional low-dimensional feature vector after dimensionality reduction. This vector eliminates the linear redundancy in the original features and improves the computational efficiency of the subsequent model.
[0024] Finally, the reduced-dimensional feature vectors are Z-score standardized, converting the feature values into a standard normal distribution with a mean of 0 and a standard deviation of 1. The standardization formula is Z = (X - μ) / σ, where Z is the standardized feature value, X is the reduced-dimensional feature value, μ is the mean of that feature dimension, and σ is the standard deviation of that feature dimension. Standardization eliminates numerical fluctuations across different feature dimensions, ensuring that each feature contributes equally to the model training. After standardization, the feature vectors are structured according to the customer's unique identifier and a unified 1-hour timestamp. Each feature vector is labeled with the customer identifier, time interval, and feature dimension information, ultimately generating a standardized comprehensive customer feature dataset. This dataset is a structured numerical time-series dataset with no missing values or redundant information, and can be directly input into a deep spatiotemporal feature extraction network for subsequent feature extraction processing.
[0025] S202, the customer comprehensive feature dataset is input into a pre-constructed deep spatiotemporal feature extraction network for processing. The deep spatiotemporal feature extraction network includes a convolutional long short-term memory network and a graph attention network. The convolutional long short-term memory network is used to extract local temporal dependency patterns from transaction behavior data, and the graph attention network is used to mine the associated risk transmission paths between customers. Joint processing generates a risk embedding vector containing individual behavioral features and group association features; specifically, it may include: S2021 extracts the transaction behavior time-series matrix from the customer comprehensive feature dataset, inputs it into the convolutional long short-term memory network, captures local fluctuation patterns through convolutional layers and captures long-term dependencies through long short-term memory units, and generates transaction behavior time-series feature vectors. The core of this step is to use the customer's unique identifier as a precise index to separate the transaction behavior time-series matrix from a standardized customer comprehensive feature dataset. Through hierarchical processing of a convolutional long short-term memory network, it achieves dual feature extraction of local temporal fluctuations and long-term temporal dependencies in customer transaction behavior, ultimately generating a structured transaction behavior time-series feature vector. The specific implementation method is as follows: When extracting the transaction behavior time series matrix from the customer comprehensive feature dataset, the preset transaction feature dimension identifiers are used as the filtering criteria. Credit record text embedding features and market dynamic features are removed, and only the transaction behavior time series data after sliding window slicing and missing value imputation are retained. During the extraction process, the unique customer identifier and a unified 1-hour time granularity index are retained to ensure that the transaction behavior time series matrix of each customer is an independent set and that the time dimension is completely consistent with the original dataset. The extracted transaction behavior time series matrix is a three-dimensional tensor with dimensions of N×T×F, where N is the number of customers, T is the time step length, and F is the transaction behavior feature dimension. In the example, T=30 and F=16 are set, representing 30 1-hour time steps and 16 core transaction features, respectively, covering dimensions such as transaction amount, transaction frequency, account balance change, and transaction type ratio.
[0026] The extracted transaction behavior time series matrix is input into a pre-constructed convolutional long short-term memory network. This network consists of convolutional layers and long short-term memory units connected in series. The convolutional layers are set to one-dimensional convolutional layers with a kernel size of 3, a stride of 1, and the same padding method to ensure that the time step length of the features remains unchanged after the convolution operation. The number of convolutional kernels is set to 128, and each convolutional kernel is responsible for capturing a local temporal fluctuation pattern. The one-dimensional convolutional layer accurately captures the local temporal dependency pattern of transaction behavior by performing convolution operations on the transaction features of adjacent time steps, such as sudden changes in transaction amount within 3 consecutive hours of a customer or local abnormal behaviors such as short-term high-frequency transfers. The output of the convolutional layer is nonlinearly introduced through the ReLU activation function to enhance the representation ability of the features. The output of the convolutional layer is directly input to the Long Short-Term Memory (LSTM) unit, which has two hidden layers, each with a dimension of 256. The LSM unit selectively retains effective transaction features over long time steps and forgets invalid noise features through a gating mechanism of input gate, forget gate, and output gate. This enables the capture of long-term dependencies in customer transaction behavior, such as changes in customer transaction patterns across weeks and months, and periodic fund inflows and outflows. This effectively solves the gradient vanishing problem of traditional recurrent neural networks and ensures the effective extraction of long-term time-series features.
[0027] The final output of the Long Short-Term Memory (LSTM) unit undergoes feature aggregation through a global pooling layer, converting the three-dimensional temporal features into a two-dimensional feature vector. Then, a fully connected layer normalizes the feature dimensions to 256 dimensions. The fully connected layer uses a linear activation function to avoid the loss of feature information. Finally, a unique 256-dimensional temporal feature vector of trading behavior is generated for each customer. This vector fully integrates the local temporal fluctuation patterns and long-term temporal dependencies of the customer's trading behavior, accurately representing the risk characteristics of the customer's individual trading behavior.
[0028] S2022: Extract customer relationship graph from customer comprehensive feature dataset, construct customer adjacency matrix and node feature matrix, input graph attention network, calculate the weight of neighbor nodes and aggregate information through multi-head attention mechanism to generate customer group relationship feature vector; The core of this step is to mine the financial relationships between customers from the comprehensive customer feature dataset and construct a relationship graph structure. Through the multi-head attention mechanism of the graph attention network, the relationship weights between customers are quantified and the group relationship features are aggregated to generate feature vectors that can characterize the risk of customer group relationships. The specific implementation method is as follows: The core basis for extracting customer relationship graphs from comprehensive customer feature datasets is the actual financial relationship behavior between customers, including counterparty relationships, fund transfer relationships, joint guarantee relationships, and relationships with the same investment group. By using fields such as counterparty identifiers, fund flows, and guarantee relationships in the dataset, customer pairs with direct financial relationships are identified, and an undirected customer relationship graph is constructed with customers as nodes and financial relationship relationships as edges. If there is at least one direct financial relationship behavior between two customers, an edge is established between them; otherwise, there is no edge connection.
[0029] Based on the constructed customer relationship graph, a customer adjacency matrix and a node feature matrix are constructed. The customer adjacency matrix is an N×N two-dimensional matrix, where N is the number of customers. Each element A_i-j represents the relationship between customer i and customer j; if there is a relationship, A_i-j=1, otherwise A_i-j=0. The adjacency matrix is symmetric, satisfying A_i-j=A_j-i. The node feature matrix is an N×D two-dimensional matrix, where D is the feature dimension of the customer nodes, with values consistent with the feature dimensions of the customer comprehensive feature dataset (D=128 in the example). Each row in the matrix represents a standardized comprehensive feature of a customer, and each column represents a feature dimension. The node feature matrix fully preserves the individual feature information of each customer. After construction, the adjacency matrix is normalized using Laplacian normalization to eliminate feature bias caused by differences in node degree, ensuring the computational fairness of the subsequent graph attention network.
[0030] The normalized adjacency matrix and node feature matrix are input into a pre-constructed graph attention network. The core of this network is a multi-head attention mechanism with 8 attention heads. Each attention head independently learns the attention weights of different dimensions of association between customers. The calculation of attention weights is based on the feature similarity of customer nodes. The original attention weights between customers are obtained by performing linear transformation and inner product calculation on the node features of adjacent customers. Then, the weights are normalized by the Softmax function, mapping the weight values to the 0-1 range. The larger the weight value, the stronger the association risk transmission between the two customers. For example, customers with frequent financial transactions will receive higher attention weights. The multi-head attention mechanism aggregates features based on the attention weights calculated by the eight attention heads. Specifically, it aggregates the node features of adjacent customers into the target customer node by weighted summation according to the weight of each attention head. Then, it concatenates the aggregated features of the eight attention heads horizontally to obtain high-dimensional group association features. The concatenated features are then normalized to 256 dimensions by a fully connected layer. The fully connected layer uses the LeakyReLU activation function to introduce non-linearity. Finally, it generates a unique 256-dimensional customer group association feature vector for each customer. This vector accurately represents the customer's position in the group association network, the strength of its association with other customers, and the potential risk transmission characteristics brought about by the group association.
[0031] S2023: The transaction behavior time-series feature vector and the customer group association feature vector are concatenated and input into the feature fusion gating unit. The contribution ratio of the two features is dynamically adjusted through learnable gating weights to generate preliminary fused features. The core of this step is to achieve preliminary integration of individual customer transaction behavior characteristics and group correlation characteristics through feature concatenation. Then, through the learnable gating weights of the feature fusion gating unit, the contribution ratio of the two types of features under different risk scenarios is dynamically adapted to achieve adaptive feature fusion and generate preliminary fused features. The specific implementation method is as follows: First, the transaction behavior time-series feature vector and the customer group association feature vector are subjected to feature concatenation. Feature concatenation employs a horizontal dimensional concatenation method, sequentially connecting each customer's 256-dimensional transaction behavior time-series feature vector with the 256-dimensional customer group association feature vector in dimensional order. The resulting concatenation generates a 512-dimensional high-dimensional joint feature vector. This vector retains all the original information of both types of features without any feature filtering or fusion, achieving a preliminary integration of individual and group features. During feature concatenation, unique feature identifiers are added to the dimensions of both types of features to ensure that subsequent gating units can accurately identify and distinguish between the two types of features, avoiding confusion between feature dimensions.
[0032] The concatenated 512-dimensional joint feature vector is input into the feature fusion gating unit. This unit is designed based on the core gating mechanism of the gating loop unit and has two learnable gating weight parameters W_1 and W_2. W_1 is the weight coefficient of the transaction behavior time series feature, and W_2 is the weight coefficient of the customer group association feature. The values of the two weight coefficients are both in the range of 0-1 and satisfy the constraint W_1+W_2=1, ensuring that the total weight distribution of the two types of features is 1, with no weight overflow or missing. The gating weight parameters are adaptively learned through end-to-end training of the network, requiring no manual setting. During model training, the weight ratios are automatically adjusted according to the importance of features in different financial risk scenarios. For example, in risk scenarios caused by abnormal individual customer trading behavior, the model will automatically adjust W_1 to approach 1 and W_2 to approach 0, allowing the temporal features of trading behavior to dominate the contribution. In risk scenarios caused by the transmission of risks associated with a group, the model will automatically adjust W_2 to approach 1 and W_1 to approach 0, allowing the features related to the customer group to dominate the contribution. In scenarios where both types of features contribute to the risk, the model will adjust the weights to an appropriate ratio, taking into account the contributions of both types of features.
[0033] The feature fusion gating unit performs weighted fusion calculations on the two types of features in the joint feature vector using gating weights. The calculation formula is F_fuse = W_1 × F_seq + W_2 × F_gra, where F_fuse is the initial fused feature, F_seq is the transaction behavior time-series feature vector, and F_gra is the customer group association feature vector. After weighted fusion, 512-dimensional feature dimensions are retained to ensure the integrity of the fused feature information. After the fusion calculation, a batch normalization layer is used to standardize the initial fused features, eliminating the impact of feature distribution offset, improving feature stability and the convergence speed of subsequent model training, and finally generating a 512-dimensional initial fused feature for each customer. This feature achieves adaptive dynamic fusion of individual customer transaction behavior features and group association features, and can accurately represent the core risk features under different risk scenarios.
[0034] S2024 inputs the preliminary fused features into the fully connected layer for nonlinear transformation and dimensionality compression to obtain a risk embedding vector containing individual behavioral features and group association features.
[0035] The core of this step is to perform nonlinear transformation and dimensionality compression on the high-dimensional preliminary fusion features through a fully connected layer. This reduces the feature dimensionality while preserving the core risk features, generating a low-dimensional, dense risk embedding vector. This provides efficient and accurate feature input for subsequent risk label mapping. The specific implementation is as follows: The 512-dimensional preliminary fused features, after batch normalization, are input into a feature transformation network consisting of two fully connected layers. The first fully connected layer is a hidden layer with a dimension of 256. The ReLU activation function is used. The ReLU activation function introduces non-linearity into the feature transformation by setting negative values to 0 and keeping positive values unchanged. This effectively captures the complex non-linear relationships between individual customer behavior features and group correlation features, such as the coupling relationship between abnormal transactions by individual customers and the risk transmission of group correlation, and the interaction of risk features of different dimensions. Non-linear transformation allows the features to more accurately represent the complex risk patterns of financial customers. The output of the first fully connected layer is directly input into the second fully connected layer, which is the output layer with a dimension of 128. No activation function is set, and a linear transformation is used to compress the feature dimension, further compressing the 256-dimensional non-linear features into 128-dimensional low-dimensional features. The dimension compression process is achieved through linear projection of the weight matrix. While reducing the feature dimension and improving the computational efficiency of subsequent models, the optimization of the weight matrix allows the low-dimensional features to retain the core risk information of the original high-dimensional fused features to the greatest extent and remove redundant noise features.
[0036] In the feature transformation and dimensionality compression process of the fully connected layer, a Dropout regularization mechanism is introduced, with a Dropout probability set to 0.2, meaning that 20% of the neuron outputs are randomly set to 0. This mechanism effectively prevents model overfitting, improves the generalization ability of features, and ensures that the risk embedding vector can adapt to different financial customer risk monitoring scenarios. The 128-dimensional low-dimensional feature output by the second fully connected layer is the risk embedding vector. This vector is a low-dimensional dense numerical feature vector with a value range of [-1, 1], fully encompassing the temporal features of individual customer transaction behavior and group correlation features. It realizes the mapping of high-dimensional risk features to a low-dimensional feature space. Its low-dimensionality significantly improves the computational efficiency of the subsequent adaptive risk label mapping module, while the dense feature distribution ensures the accuracy of risk feature representation. Each financial customer corresponds to a unique 128-dimensional risk embedding vector. This vector, as the final output of the deep spatiotemporal feature extraction network, is the core feature basis for the subsequent adaptive risk label mapping module to determine the risk level and output labels.
[0037] S203, the risk embedding vector is input to the adaptive risk label mapping module, which performs nonlinear transformation through a multilayer perceptron and calculates the probability distribution of each risk level. Simultaneously, a contrastive learning loss function is used to enhance the feature clustering effect of similar risk samples, outputting dynamically updated customer risk labels; specifically, this may include: S2031, the risk embedding vector is input into the multilayer perceptron, and then nonlinearly mapped through multiple fully connected layers and ReLU activation function to generate the initial probability distribution of each risk level; The core of this step is to input the low-dimensional risk embedding vector output by the deep spatiotemporal feature extraction network into a customized multilayer perceptron. Through hierarchical processing using multiple fully connected layers and the ReLU activation function, a nonlinear transformation of the risk features is achieved. Finally, a normalization operation is used to generate the initial probability distribution for each risk level, realizing a preliminary mapping from risk features to risk level probabilities. The specific implementation method is as follows: The input risk embedding vector is a 128-dimensional low-dimensional dense numerical vector, which is the core risk representation that integrates individual customer behavioral characteristics and group association characteristics. It is input into a pre-constructed multilayer perceptron in batches according to the unique customer identifier. This perceptron is a fully connected feedforward network, designed with a three-layer fully connected layer architecture to meet the business needs of financial customer risk level classification. The number of neurons in each layer decreases progressively according to the feature dimension. The first fully connected layer maps the 128-dimensional risk embedding vector to 64 dimensions, the second layer further maps it to 32 dimensions, and the third output layer maps it to K dimensions, where K is the number of risk levels preset in financial business. In the example, K is set to 4 according to the risk level, corresponding to four levels: low risk, medium risk, high risk, and extremely high risk. The output layer dimension is completely matched with the number of risk levels, ensuring that each dimension corresponds to a feature representation of a risk level.
[0038] The first two fully connected layers of the multilayer perceptron are followed by the ReLU activation function. This activation function introduces a nonlinear transformation into the network by setting negative values to 0 and keeping positive values unchanged. This effectively captures the complex nonlinear relationship between financial risk characteristics and risk levels, and solves the problem of risk feature interaction that linear models cannot represent. For example, it addresses the complex risk characteristics formed by the coupling of abnormal customer trading behavior and group-related risks. At the same time, the ReLU activation function can alleviate the gradient vanishing problem in deep network training and ensure the convergence of multilayer perceptron training. After the nonlinear transformations of the first two layers are completed, the third output layer does not directly output feature values. Instead, it performs Softmax normalization on the output K-dimensional logit values. The Softmax function converts the logit values of each dimension into probability values in the range of 0-1, and the sum of the probability values of all dimensions is 1. This processing gives the output results probabilistic statistical meaning. The probability value of each dimension represents the probability that a customer belongs to the corresponding risk level. After this processing, a K-dimensional probability vector matching the number of risk levels is generated for each customer. This vector is the initial probability distribution of each risk level. In the example, the initial probability distribution of a customer is [0.08, 0.12, 0.65, 0.15], which correspond to the initial attribution probabilities of low, medium, high, and extremely high risks, respectively.
[0039] S2032, calculate the cross-entropy loss based on the initial probability distribution, and at the same time randomly select positive and negative sample pairs from the current batch to calculate the contrastive learning loss to bring the feature distance of similar samples closer and push away dissimilar samples, thus obtaining the comprehensive loss function; The core of this step is to construct a comprehensive loss function that fuses two losses. Cross-entropy loss ensures the accuracy of risk level classification, while contrastive learning loss enhances the feature clustering effect of similar risk samples. This allows the model to both accurately classify risks and capture the inherent similarity of risk features. The specific implementation is as follows: First, the cross-entropy loss is calculated based on the initial probability distribution. Cross-entropy loss is a classic loss function for classification tasks, used to quantify the difference between the model's predicted initial probability distribution and the customer's true risk label. Before calculation, the customer's true risk label needs to be converted into a one-hot encoded form. If the customer's true risk level is high risk, the corresponding one-hot encoding is [0,0,1,0]. The dimension of the one-hot encoding is consistent with the number of risk levels K, with only the dimension corresponding to the true label being 1, and the rest being 0. The cross-entropy loss is calculated in batches, with a batch size set to 64, meaning that the average loss of 64 customer samples is calculated each time. The calculation formula is the negative of the sum of the log-likelihood values of the one-hot encoded vector of the true label and the initial probability distribution vector for each customer in the batch. The smaller the loss value, the higher the matching degree between the initial probability distribution and the true risk label. The core role of cross-entropy loss is to guide the multilayer perceptron to train in the direction of improving the accuracy of risk level classification.
[0040] Simultaneously, the contrastive learning loss is calculated. The core objective of this loss is to narrow the feature distance between similar risk samples and widen the feature distance between dissimilar risk samples, thereby enhancing the model's clustering ability and generalization of risk features. Before calculation, positive and negative sample pairs are randomly selected from each of the current batches of 64 samples. The selection rules are as follows: positive sample pairs are other samples with the same true risk level as the current sample, with 5 positive samples selected for each sample to ensure sufficient comparison of similar features; negative sample pairs are samples with different true risk levels than the current sample, with 20 negative samples selected for each sample to ensure effective differentiation of dissimilar features. The selection of positive and negative sample pairs is completed within the batch to avoid sample bias across batches. The contrastive learning loss is calculated using the InfoNCE loss method. First, the cosine similarity of the features between the current sample and the positive and negative samples is calculated. The cosine similarity value ranges from [-1, 1]. A larger value indicates more similar features. Then, the similarity is normalized using the Softmax function. Finally, the loss value is calculated. A smaller loss value indicates that the features of similar samples are closer and the features of dissimilar samples are more distant, achieving feature clustering of similar risk samples.
[0041] A weighted fusion of cross-entropy loss and contrastive learning loss yields a comprehensive loss function. The weight coefficient α for cross-entropy loss is set to 0.7, and the weight coefficient β for contrastive learning loss is set to 0.3, with the sum of α and β being 1. This weight allocation is based on the core requirement of configuring risk labels for financial clients, prioritizing the accuracy of risk level classification while optimizing feature representation through contrastive learning loss. The comprehensive loss function is calculated as: Total Loss = α × Cross-Entropy Loss + β × Contrastive Learning Loss. This function balances classification accuracy and feature clustering effectiveness, providing a unified loss optimization objective for subsequent parameter updates in the multilayer perceptron.
[0042] S2033 uses backpropagation of the comprehensive loss function to update the parameters of the multilayer perceptron and adjusts the smoothness of the probability distribution through the temperature coefficient to generate an optimized risk level probability distribution. The core of this step is to pass the loss value of the comprehensive loss function to each layer of the multilayer perceptron through the backpropagation algorithm, update the network parameters using gradient descent, and introduce a temperature coefficient to adjust the smoothness of the risk level probability distribution, so that the optimized probability distribution better matches the risk identification needs of financial business. The specific implementation method is as follows: First, backpropagation of the comprehensive loss function is performed, starting from the output layer of the multilayer perceptron. The gradients of the comprehensive loss function with respect to the weights, biases, and intermediate features of each fully connected layer are calculated sequentially. The gradient values represent the influence of each parameter on the loss value; positive values indicate that increasing the parameter leads to an increase in the loss value, while negative values indicate that increasing the parameter leads to a decrease in the loss value. To avoid gradient explosion during training, a gradient clipping mechanism is introduced, with a clipping threshold of 1.0. If the absolute value of the calculated gradient exceeds this threshold, it is clipped to within the threshold range to ensure gradient stability. The Adam adaptive gradient descent method is used to update the parameters of the multilayer perceptron, with a learning rate of 0.001. This learning rate balances the training speed and convergence accuracy of the model. The Adam algorithm dynamically adjusts the learning rate of each parameter based on the first and second moments of the gradient, making the parameter updates more suitable for the model's training state. After each parameter update, the feature mapping capability of the multilayer perceptron is optimized towards minimizing the comprehensive loss function, improving the performance of risk level classification and feature clustering.
[0043] The smoothness of the probability distribution is then adjusted using a temperature coefficient, τ, a hyperparameter greater than 0 and less than or equal to 1. Its core function is to scale the logit value of the multilayer perceptron output layer before inputting it into the Softmax function to generate the probability distribution. The scaling formula is logit value / τ. The magnitude of the temperature coefficient directly affects the shape of the probability distribution: the smaller τ is, the greater the scaling factor of the logit value, resulting in a steeper probability distribution and higher discrimination between risk levels; the larger τ is, the smaller the scaling factor of the logit value, resulting in a smoother probability distribution, effectively avoiding overfitting of the model to a single risk level. For the business scenario of financial customer risk identification, the temperature coefficient τ is initially set to 0.5. If the risk level discrimination of a batch of samples is low, τ can be appropriately reduced to 0.3 to increase the steepness of the probability distribution; if noise in the samples causes large fluctuations in the probability distribution, τ can be appropriately increased to 0.8 to make the distribution smoother.
[0044] The multilayer perceptron, after updating the input parameters of the risk embedding vector, undergoes a nonlinear transformation to obtain a new logit value. This value is then scaled using a temperature coefficient τ and normalized using Softmax to generate an optimized risk level probability distribution. This probability distribution is the output of the multilayer perceptron after loss optimization. It improves the matching degree with the customer's true risk label and makes the probability distribution of similar risk samples more clustered through comparative learning. At the same time, the adjustment of the temperature coefficient makes the shape of the probability distribution more suitable for the risk identification needs of actual business. In the example, the optimized probability distribution for a certain customer is [0.05, 0.09, 0.78, 0.08]. Compared with the initial probability distribution, the probability of attribution to high risk is more prominent, and the differentiation of risk levels is higher.
[0045] S2034: Based on the optimized risk level probability distribution, select the risk level corresponding to the maximum probability as the current customer's risk label, record the label confidence level, and output the dynamically updated customer risk label.
[0046] The core of this step is to complete the final allocation of customer risk labels based on the optimized risk level probability distribution. At the same time, the maximum probability value is extracted as the label confidence level to generate dynamic risk labels containing multi-dimensional traceability information, thereby achieving accurate output and updating of financial customer risk labels. The specific implementation method is as follows: First, the risk label assignment operation is performed. Based on the customer's optimized risk level probability distribution, all K dimensions of the probability distribution are traversed, and the probability value with the largest value is extracted. The risk level corresponding to the maximum value is the final risk label for the current customer. The assignment rule follows the "maximum probability allocation principle," which conforms to the basic logic of probability statistics and is consistent with the risk level determination requirements in financial business. In the example, in the optimized probability distribution [0.05, 0.09, 0.78, 0.08], the maximum value 0.78 corresponds to a high risk level, so a high-risk label is assigned to this customer. If two or more of the same maximum probability values appear, that is, the probability distribution has a level, the model will retrieve the customer's risk embedding vector features and select the risk level of the same type of sample with the highest similarity to the feature as the final label, avoiding ambiguity in label assignment.
[0047] While assigning risk labels, the maximum probability value corresponding to each label is recorded as the label confidence level. The label confidence level is a value between 0 and 1, and its magnitude represents the model's confidence in the risk label assignment result. The closer the value is to 1, the more sufficient the model's basis for determining that the customer belongs to that risk level, and the higher the reliability of the label result. The closer the value is to 0, the less sufficient the model's basis for label assignment, and the ambiguity of the customer's risk characteristics, requiring further confirmation through manual verification. In the example, the confidence level corresponding to the high-risk label is 0.78, which means that the model's confidence level in determining that the customer is high-risk is 78%. If the confidence level is lower than 0.5, the customer is marked as a sample with ambiguous risk characteristics and included in the scope of key verification.
[0048] The final output is a dynamically updated customer risk label. Each label is a structured data unit containing three core pieces of information: a unique customer identifier, a risk label name, and a label confidence level. It also includes traceability information such as the label generation time, risk characteristic basis, and model training batch. The risk characteristic basis extracts core high-contribution features from the customer's risk embedding vector, such as sudden changes in transaction amount or association with high-risk groups, making the label results interpretable. This risk label is dynamically updated; each time a new comprehensive customer feature dataset is input into the model, a completely new risk embedding vector is generated and a label is assigned. Furthermore, as the model parameters are continuously optimized, the accuracy and confidence level of the labels continuously improve, enabling dynamic configuration and iterative optimization of financial customer risk labels to adapt to real-time changes in the financial market and customer behavior.
[0049] S204. Based on the comparison between the customer risk label and the preset business tolerance threshold, the risk monitoring strategy parameters of the adaptive risk label mapping module are automatically adjusted, and the label change data and the adjusted strategy parameters are fed back to the deep spatiotemporal feature extraction network for incremental learning, generating an optimized deep spatiotemporal feature extraction network for dynamic configuration of financial customer risk labels in the next cycle.
[0050] Specifically, dynamically updated customer risk tags can be compared with preset business tolerance thresholds to calculate the degree of tag deviation. If the deviation exceeds the threshold, a strategy adjustment signal is triggered, generating a strategy adjustment instruction. The core of this step is to quantitatively compare the real-time risk label distribution of financial clients with the financial institution's preset business risk control tolerance standards. By calculating the degree of label deviation, it is determined whether the risk monitoring strategy needs to be adjusted. If the adjustment conditions are met, a clear strategy adjustment instruction is generated, providing a decision-making basis for subsequent parameter optimization. The specific implementation method is as follows: The preset business tolerance thresholds are reasonable ranges for the proportion of customers at each risk level, set by financial institutions based on their own risk control objectives, business types, and regulatory requirements. This is a multi-dimensional quantitative threshold system, divided into four risk levels: low risk, medium risk, high risk, and extremely high risk. The thresholds are presented as percentages, and the sum of the percentages for all levels is 100%. In the example, the business tolerance thresholds set for retail financial business are: low risk ≥85%, medium risk ≤10%, high risk ≤3.5%, and extremely high risk ≤1.5%. This threshold system balances business development and risk control, setting a strict upper limit for high-risk levels and a basic lower limit for low-risk levels. Before comparison, the dynamic risk label distribution of all financial customers in the current period is statistically analyzed, calculating the actual percentage of customers at each risk level relative to the total number of customers. This data, which is a quantitative indicator of the same dimension as the business tolerance thresholds, ensures the effectiveness of the comparison.
[0051] The calculation of label deviation is divided into two dimensions: individual risk level deviation and overall comprehensive deviation. Individual risk level deviation is used to determine the deviation between the actual proportion of a single risk level and its threshold. The calculation formula is Di = |P_i - T_i| / T_i, where Di_i is the individual deviation of the i-th risk level, P_i is the actual proportion of the i-th risk level, and T_i is the business tolerance threshold for the i-th risk level. The individual deviation ranges from 0 to positive infinity; a larger value indicates a more significant deviation between the actual distribution of that level and the threshold. Overall comprehensive deviation is used to determine the deviation of all risk levels. The overall deviation is calculated as a weighted sum of the deviations of each individual risk level, using the formula D_all=Σ(W_i×D_i), where D_all is the overall comprehensive deviation, and W_i is the weight coefficient of the i-th risk level. The weight coefficients are set differently according to the severity of the risk level; the higher the risk level, the greater the weight. In the example, extremely high risk W_4=0.4, high risk W_3=0.3, medium risk W_2=0.2, and low risk W_1=0.1 are set to ensure that the deviation of high-risk levels has a more prominent impact on the overall result. The overall comprehensive deviation ranges from 0 to positive infinity.
[0052] A trigger threshold for strategy adjustment is set. The trigger threshold for a single high-risk or extremely high-risk risk is Di ≥ 0.4, and the trigger threshold for the overall comprehensive deviation is D_all ≥ 0.2. If any trigger condition is met, it is determined that the current risk monitoring strategy is no longer suitable for business risk control needs, and a strategy adjustment signal is immediately triggered. This signal is a digital trigger instruction, containing core information such as the trigger reason, trigger threshold, and actual deviation data. Based on the actual situation of the trigger signal and label deviation, a strategy adjustment instruction is generated. The adjustment instruction is a clear guide to the direction of parameter adjustment, without specific values, but only defines the adjustment trend according to the type of deviation. In the example, if the actual proportion of extremely high risk exceeds the threshold, triggering the signal, the adjustment instruction is "tighten the risk level judgment threshold and increase the comparison learning temperature parameter to enhance the differentiation of high-level risks"; if the actual proportion of medium risk is too low and the proportion of low risk is too high, the adjustment instruction is "relax the risk level judgment threshold and reduce the comparison learning temperature parameter to optimize the fine division of risk levels". The adjustment instruction ensures that the direction of subsequent parameter adjustments is consistent with business risk control needs.
[0053] Based on the strategy adjustment instructions, the monitoring strategy parameters in the adaptive risk label mapping module are automatically adjusted, including the risk level threshold and the comparative learning temperature parameter, to generate the adjusted strategy parameter set. The core of this step is to automatically adjust the core monitoring strategy parameters in the adaptive risk label mapping module in a gradient manner according to the direction of the strategy adjustment instruction, so as to avoid the model failure caused by sudden parameter changes. At the same time, all the adjusted parameters are structured and integrated to generate a set of strategy parameters that can be directly applied. The specific implementation method is as follows: First, adjust the risk level threshold. This parameter is the probability threshold for dividing different risk levels in the adaptive risk label mapping module. It is an interval-type threshold that matches the risk level and is set based on the optimized risk level probability distribution. The original initial threshold is set from low to high risk as follows: low risk ≥ 0.8, medium risk 0.5 ≤ P < 0.8, high risk 0.2 ≤ P < 0.5, and extremely high risk P < 0.2, where P is the probability value of a customer belonging to the corresponding risk level. This threshold is the judgment boundary for each risk level and directly determines the label allocation result. The adjustment of risk level thresholds strictly follows the direction of the strategy adjustment instructions. If the instruction is "tighten the threshold," the probability thresholds for high risk and very high risk are increased, while the probability threshold for low risk is decreased. In the example, the threshold for very high risk is adjusted to P < 0.15, high risk to 0.25 ≤ P < 0.5, and low risk to ≥ 0.75, achieving a stricter judgment on high-level risks. If the instruction is "relax the threshold," the probability thresholds for high risk and very high risk are decreased, while the probability threshold for low risk is increased. In the example, the threshold for very high risk is adjusted to P < 0.25, high risk to 0.15 ≤ P < 0.5, and low risk to ≥ 0.85, achieving a more refined classification of risk levels. The adjustment of risk level thresholds adopts a gradient step size, with each adjustment having a probability interval step size of 0.05, ensuring the smoothness of parameter adjustment and avoiding large fluctuations in label assignment results due to boundary abrupt changes.
[0054] Subsequently, the contrastive learning temperature parameter is adjusted. This parameter, denoted by τ, is a core hyperparameter affecting the smoothness of the risk level probability distribution, ranging from 0.3 to 0.8. A smaller τ value results in a steeper probability distribution and higher differentiation between risk levels; a larger τ value results in a smoother probability distribution, effectively reducing misjudgments of risk levels caused by sample noise. The adjustment of the contrastive learning temperature parameter is linked to the risk level threshold. If the instruction is "enhance risk differentiation," the τ value is decreased in steps of 0.05; in this example, τ is adjusted from 0.5 to 0.4. If the instruction is "optimize fine segmentation and reduce misjudgments," the τ value is increased in steps of 0.05; in this example, τ is adjusted from 0.5 to 0.6. The adjustment step size of the temperature parameter is consistent with the risk level threshold, ensuring the coordination of parameter adjustments within the module.
[0055] After all parameter adjustments are completed, all monitoring strategy parameters in the adaptive risk label mapping module are structurally integrated to generate an adjusted strategy parameter set. This parameter set is a standardized parameter configuration file containing seven core fields: parameter category, parameter name, initial value, adjusted value, adjustment step size, adjustment direction, and adjustment basis. It not only includes risk level thresholds and comparative learning temperature parameters but also retains the current values of other unadjusted parameters within the module, ensuring the integrity of the parameter set. The parameter set is in a digital configuration format and can be directly loaded and parsed by the adaptive risk label mapping module. A unique version number and generation timestamp are added to each parameter set to achieve traceability management of parameter adjustments. In the example, the version number is set as "V + year + month + adjustment number". The generated parameter set V20250501 contains the adjusted risk level threshold range and a comparative learning temperature parameter with τ=0.4.
[0056] The current period's customer risk label change data and the adjusted strategy parameter set are encapsulated and used as input data for incremental learning, which is then fed back to the deep spatiotemporal feature extraction network. The core of this step is to integrate and structure the dynamic changes in customer risk tags with the adjusted strategy parameter set, eliminating data heterogeneity and generating incremental learning data that meets the input requirements of a deep spatiotemporal feature extraction network. A feedback mechanism is then used to link strategy adjustments with feature extraction network optimization. The specific implementation method is as follows: The current period's customer risk label change data is a comprehensive dataset reflecting the dynamic changes in customer risk characteristics and label allocation results. It is divided into two layers: a detailed layer and a statistical layer. The detailed layer data is customer-specific and includes information such as the customer's unique identifier, risk label from the previous period, risk label from the current period, label confidence level changes, differences in core features of the risk embedding vector, and preliminary determination of the reasons for label changes, accurately representing the details of risk label changes for individual customers. The statistical layer data is aggregated data for all customers, including changes in the overall label distribution within the period, increases or decreases in the number of customers at each risk level, changes in the overall average label confidence level, and group characteristics of high-risk label changes, reflecting the trend of risk label changes for the customer group. Both types of data are marked with a period identifier and a timestamp, forming a clear temporal relationship with the previous period's data, ensuring that incremental learning can capture the temporal patterns of risk characteristic changes.
[0057] The encapsulation process is based on the input data specifications of the deep spatiotemporal feature extraction network. It structurally integrates and unifies the customer risk label change data and the adjusted strategy parameter set. First, the core parameters of the strategy parameter set (risk level threshold, contrast learning temperature parameter) are numerically encoded and converted into numerical features of the same dimension as the risk embedding vector, realizing the fusion of parameter data and feature data. Then, the detailed layer label change data is associated with the customer's original comprehensive feature data, preserving the correspondence between customer features and label changes. Finally, the fused parameter features, detailed layer data, and statistical layer data are regularized according to the network input dimension, divided into a feature input part and a supervision signal part. The feature input part is the fusion vector of the customer's comprehensive features and parameter encoded features, and the supervision signal part is the customer's label change value and label distribution change value, ensuring that the encapsulated data can simultaneously provide feature input and training supervision for the deep spatiotemporal feature extraction network.
[0058] The encapsulated incremental learning input data is structured data in tensor form. The data dimension perfectly matches the number of neurons in the input layer of the deep spatiotemporal feature extraction network, and the value range is standardized to [-1, 1] using Z-score to eliminate the impact of dimensional differences on network training. Through the model training feedback interface, the incremental learning input data is accurately fed back to the feature fusion layer and parameter update layer of the deep spatiotemporal feature extraction network. The feature fusion layer receives the fused feature input, and the parameter update layer receives the supervision signal, providing a complete and standardized input data foundation for subsequent incremental learning of the network, and realizing bidirectional feedback linkage between policy adjustment parameters and changes in customer risk label data.
[0059] Incremental learning is performed on the deep spatiotemporal feature extraction network to fine-tune the network weights using new data, generating an optimized deep spatiotemporal feature extraction network, which is then deployed for dynamic configuration of risk labels in the next cycle.
[0060] The core of this step is to use an incremental learning strategy to perform lightweight parameter fine-tuning on the deep spatiotemporal feature extraction network. While retaining the network's original feature extraction capabilities, the network is made adaptable to new risk monitoring strategy parameters and changes in customer risk characteristics to avoid catastrophic forgetting. Finally, the optimized network is deployed to the production environment to achieve dynamic configuration of risk labels. The specific implementation method is as follows: Before incremental learning, the weights of the deep spatiotemporal feature extraction network are frozen. This network consists of a convolutional long short-term memory network, a graph attention network, a feature fusion gating unit, and a fully connected layer. 70% of the pre-trained weights in the bottom convolutional layers and graph attention layers are frozen. Only the top feature fusion gating unit, the fully connected layer, and some learnable weights in the middle layer are retained for training. The bottom weights are the core of the network for extracting basic spatiotemporal features. Freezing the weights can effectively preserve the network's ability to extract basic risk features of financial customers and prevent catastrophic forgetting caused by incremental learning. Only the upper layer weights are fine-tuned to allow the network to adapt to new strategy parameters and changes in risk features.
[0061] Dedicated hyperparameters for incremental learning are set, distinct from those used in the initial network training. The learning rate is set to 0.0001, significantly lower than the initial 0.001, ensuring small-scale gradient fine-tuning of network weights and preventing abrupt weight changes. The batch size is set to 32, employing mini-batch training to improve the accuracy of parameter fine-tuning. The number of iterations is set to 50, adapted to the scale of the incremental data, ensuring sufficient training without overfitting. The optimizer uses the AdamW adaptive gradient descent method, incorporating a weight decay coefficient of 0.001, and employing L2 regularization to prevent overfitting on incremental data, thereby improving the model's generalization ability. The loss function for incremental learning is the label change correlation loss, using the supervisory signals (label change values, label distribution change values) in the encapsulated data as the supervision target. The correlation error between the network's output risk embedding vector and label changes is calculated, and this error is propagated to the unfrozen network layers via backpropagation, driving gradient fine-tuning of the weights. This allows the network to accurately capture risk feature changes related to label changes, while simultaneously adapting to the new feature extraction requirements of the adjusted policy parameters.
[0062] After training, the network's performance is validated using a test set. Validation metrics include risk label prediction accuracy and feature clustering similarity. If the risk label prediction accuracy on the test set improves by ≥1% compared to the original network, and the feature clustering similarity of similar risk samples improves by ≥5%, then the incremental learning training is considered effective. An optimized deep spatiotemporal feature extraction network is generated, retaining all fine-tuned weights and parameter configurations, and a new network version file is generated. If validation fails, the learning rate is reduced to 0.00005, and 20 more iterations are added for retraining until validation succeeds.
[0063] Finally, the optimized deep spatiotemporal feature extraction network is deployed to the production environment's model inference framework and configured in conjunction with the adaptive risk label mapping module. The adjusted strategy parameter set is loaded into the adaptive risk label mapping module, ensuring that the parameter versions of the two modules are consistent, thus achieving collaborative optimization of the model and strategy. Deployment uses a hot-update method, ensuring uninterrupted risk label configuration services for the current cycle, while retaining the original network's version files and parameter sets as backups. If the new network encounters an anomaly during operation, a rollback can be completed within 10 seconds, guaranteeing service stability. The optimized network will be directly used for the dynamic configuration of financial customer risk labels in the next cycle. Based on new strategy parameters and changes in customer risk characteristics, it can extract more accurate spatiotemporal risk features and generate risk embedding vectors that better meet business risk control needs.
[0064] Another embodiment of the present invention provides a deep learning-based dynamic configuration system for financial customer risk labels, see [link to relevant documentation]. Figure 5 The system may include: The acquisition module 501 is used to collect multi-source heterogeneous data in real time, including customer transaction behavior time series data, credit record text data and external market dynamic numerical data, and to preprocess the multi-source heterogeneous data to generate a standardized customer comprehensive feature dataset. The generation module 502 is used to input the customer comprehensive feature dataset into a pre-built deep spatiotemporal feature extraction network for processing. The deep spatiotemporal feature extraction network includes a convolutional long short-term memory network and a graph attention network. The convolutional long short-term memory network is used to extract local temporal dependency patterns in transaction behavior data, and the graph attention network is used to mine the associated risk transmission path between customers. Through joint processing, a risk embedding vector containing individual behavioral features and group association features is generated. The output module 503 is used to input the risk embedding vector into the adaptive risk label mapping module, perform nonlinear transformation through a multilayer perceptron and calculate the probability distribution of each risk level, and combine the contrastive learning loss function to enhance the feature clustering effect of similar risk samples, and output dynamically updated customer risk labels. The learning module 504 is used to automatically adjust the risk monitoring strategy parameters of the adaptive risk label mapping module based on the comparison between the customer risk label and the preset business tolerance threshold, and feed the label change data and the adjusted strategy parameters back to the deep spatiotemporal feature extraction network for incremental learning, so as to generate an optimized deep spatiotemporal feature extraction network for the dynamic configuration of financial customer risk labels in the next cycle.
[0065] This invention also provides a storage medium storing a computer program, wherein the computer program is configured to execute the steps in any of the above method embodiments when running.
[0066] This invention also provides an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to run the computer program to perform the steps in any of the above method embodiments.
[0067] The above description, based on the embodiments shown in the figures, details the structure, features, and effects of the present invention. The above description is only a preferred embodiment of the present invention, but the present invention is not limited to the scope of implementation shown in the figures. Any changes made in accordance with the concept of the present invention, or equivalent embodiments modified to have equivalent changes, that do not exceed the spirit covered by the specification and figures, should be within the protection scope of the present invention.
Claims
1. A method for dynamically configuring risk labels for financial clients based on deep learning, characterized in that, The method includes: Real-time collection of multi-source heterogeneous data, including time-series data of customer transaction behavior, text data of credit records, and numerical data of external market dynamics, and preprocessing of the multi-source heterogeneous data to generate a standardized comprehensive customer feature dataset; The customer comprehensive feature dataset is input into a pre-constructed deep spatiotemporal feature extraction network for processing. The deep spatiotemporal feature extraction network includes a convolutional long short-term memory network and a graph attention network. The convolutional long short-term memory network is used to extract local temporal dependency patterns in transaction behavior data, and the graph attention network is used to mine the associated risk transmission paths between customers. Through joint processing, a risk embedding vector containing individual behavioral features and group association features is generated. The risk embedding vector is input into the adaptive risk label mapping module, which performs nonlinear transformation through a multilayer perceptron and calculates the probability distribution of each risk level. At the same time, it combines a contrastive learning loss function to enhance the feature clustering effect of similar risk samples and outputs dynamically updated customer risk labels. Based on the comparison between the customer risk label and the preset business tolerance threshold, the risk monitoring strategy parameters of the adaptive risk label mapping module are automatically adjusted, and the label change data and the adjusted strategy parameters are fed back to the deep spatiotemporal feature extraction network for incremental learning, generating an optimized deep spatiotemporal feature extraction network for dynamic configuration of financial customer risk labels in the next cycle.
2. The method according to claim 1, characterized in that, The real-time acquisition includes multi-source heterogeneous data such as time-series data of customer transaction behavior, textual data of credit records, and numerical data of external market dynamics. The multi-source heterogeneous data is then preprocessed to generate a standardized comprehensive customer feature dataset, including: Real-time collection of multi-source heterogeneous data, including time-series data of customer transaction behavior, text data of credit records, and numerical data of external market dynamics, to generate original multi-source heterogeneous datasets; Based on the original multi-source heterogeneous dataset, sliding window slicing and linear interpolation are used to fill missing values in the time series data of customer transaction behavior to generate a standardized transaction behavior time series matrix; BERT model encoding is used to encode credit record text to obtain fixed-length semantic vectors to generate credit record text embedding features; normalization and first-order differencing are performed on external market numerical data to generate market dynamic feature sequences. Based on the transaction behavior time series matrix, credit record text embedding features and market dynamic feature sequences, a multi-source data timestamp alignment algorithm is used to unify all data to the same time granularity, generating a time-aligned multimodal feature set; The time-aligned multimodal feature set is spliced together, and principal component analysis is applied for dimensionality reduction and redundancy removal to finally generate a standardized comprehensive customer feature dataset.
3. The method according to claim 2, characterized in that, The process involves inputting the comprehensive customer feature dataset into a pre-constructed deep spatiotemporal feature extraction network for processing. This network includes a convolutional long short-term memory (LSTM) network and a graph attention network. The LSTM network extracts local temporal dependency patterns from transaction behavior data, while the graph attention network mines risk transmission paths between customers. Through joint processing, a risk embedding vector containing both individual behavioral features and group correlation features is generated, including: The transaction behavior time-series matrix is extracted from the customer comprehensive feature dataset and input into the convolutional long short-term memory network. The convolutional layer captures local fluctuation patterns and the long short-term memory unit captures long-term dependencies to generate transaction behavior time-series feature vectors. Extract customer relationship graphs from comprehensive customer feature datasets, construct customer adjacency matrices and node feature matrices, input them into graph attention networks, calculate the weights of neighboring nodes and aggregate information through multi-head attention mechanisms, and generate customer group association feature vectors. The transaction behavior time-series feature vector and the customer group association feature vector are concatenated and input into the feature fusion gating unit. The contribution ratio of the two features is dynamically adjusted through learnable gating weights to generate preliminary fused features. The initial fused features are input into a fully connected layer for nonlinear transformation and dimensionality compression to obtain a risk embedding vector that includes individual behavioral features and group association features.
4. The method according to claim 3, characterized in that, The risk embedding vector is input into the adaptive risk label mapping module, which performs nonlinear transformation through a multilayer perceptron and calculates the probability distribution of each risk level. Simultaneously, a contrastive learning loss function is used to enhance the feature clustering effect of similar risk samples, outputting dynamically updated customer risk labels, including: The risk embedding vector is input into a multilayer perceptron and then nonlinearly mapped through multiple fully connected layers and a ReLU activation function to generate the initial probability distribution of each risk level. The cross-entropy loss is calculated based on the initial probability distribution. At the same time, positive and negative sample pairs are randomly selected from the current batch, and the contrastive learning loss is calculated to bring the feature distance of similar samples closer and push away dissimilar samples, thus obtaining the comprehensive loss function. The parameters of the multilayer perceptron are updated by backpropagation using the comprehensive loss function, and the smoothness of the probability distribution is adjusted by the temperature coefficient to generate an optimized risk level probability distribution. Based on the optimized risk level probability distribution, the risk level corresponding to the maximum probability is selected as the current customer's risk label, and the label confidence level is recorded. The dynamically updated customer risk label is then output.
5. The method according to claim 4, characterized in that, The process involves comparing the customer risk label with a preset business tolerance threshold, automatically adjusting the risk monitoring strategy parameters of the adaptive risk label mapping module, and feeding the label change data and the adjusted strategy parameters back to the deep spatiotemporal feature extraction network for incremental learning. This generates an optimized deep spatiotemporal feature extraction network for dynamic configuration of financial customer risk labels in the next cycle. This includes: The system compares dynamically updated customer risk tags with preset business tolerance thresholds, calculates the degree of tag deviation, and triggers a strategy adjustment signal if the deviation exceeds the threshold, generating a strategy adjustment instruction. Based on the strategy adjustment instructions, the monitoring strategy parameters in the adaptive risk label mapping module are automatically adjusted, including the risk level threshold and the comparative learning temperature parameter, to generate the adjusted strategy parameter set. The current period's customer risk label change data and the adjusted strategy parameter set are encapsulated and used as input data for incremental learning, which is then fed back to the deep spatiotemporal feature extraction network. Incremental learning is performed on the deep spatiotemporal feature extraction network to fine-tune the network weights using new data, generating an optimized deep spatiotemporal feature extraction network, which is then deployed for dynamic configuration of risk labels in the next cycle.
6. A deep learning-based dynamic configuration system for financial customer risk labels, characterized in that, The system includes: The data acquisition module is used to collect multi-source heterogeneous data in real time, including time-series data of customer transaction behavior, text data of credit records, and numerical data of external market dynamics, and to preprocess the multi-source heterogeneous data to generate a standardized customer comprehensive feature dataset. The generation module is used to input the customer comprehensive feature dataset into a pre-built deep spatiotemporal feature extraction network for processing. The deep spatiotemporal feature extraction network includes a convolutional long short-term memory network and a graph attention network. The convolutional long short-term memory network is used to extract local temporal dependency patterns in transaction behavior data, and the graph attention network is used to mine the associated risk transmission paths between customers. Through joint processing, a risk embedding vector containing individual behavioral features and group association features is generated. The output module is used to input the risk embedding vector into the adaptive risk label mapping module, perform nonlinear transformation through a multilayer perceptron and calculate the probability distribution of each risk level, and combine the contrastive learning loss function to enhance the feature clustering effect of similar risk samples, and output dynamically updated customer risk labels. The learning module is used to compare the customer risk label with a preset business tolerance threshold, automatically adjust the risk monitoring strategy parameters of the adaptive risk label mapping module, and feed the label change data and the adjusted strategy parameters back to the deep spatiotemporal feature extraction network for incremental learning, generating an optimized deep spatiotemporal feature extraction network for dynamic configuration of financial customer risk labels in the next cycle.
7. The system according to claim 6, characterized in that, The acquisition module is specifically used for: Real-time collection of multi-source heterogeneous data, including time-series data of customer transaction behavior, text data of credit records, and numerical data of external market dynamics, to generate original multi-source heterogeneous datasets; Based on the original multi-source heterogeneous dataset, sliding window slicing and linear interpolation are used to fill in missing values in the time series data of customer transaction behavior, generating a standardized transaction behavior time series matrix. Credit record text is encoded using the BERT model to obtain a fixed-length semantic vector, which is then used to generate credit record text embedding features. Normalize and perform first-order difference processing on external market numerical data to generate a market dynamic feature sequence; Based on the transaction behavior time series matrix, credit record text embedding features and market dynamic feature sequences, a multi-source data timestamp alignment algorithm is used to unify all data to the same time granularity, generating a time-aligned multimodal feature set; The time-aligned multimodal feature set is spliced together, and principal component analysis is applied for dimensionality reduction and redundancy removal to finally generate a standardized comprehensive customer feature dataset.
8. The system according to claim 7, characterized in that, The generation module is specifically used for: The transaction behavior time-series matrix is extracted from the customer comprehensive feature dataset and input into the convolutional long short-term memory network. The convolutional layer captures local fluctuation patterns and the long short-term memory unit captures long-term dependencies to generate transaction behavior time-series feature vectors. Extract customer relationship graphs from comprehensive customer feature datasets, construct customer adjacency matrices and node feature matrices, input them into graph attention networks, calculate the weights of neighboring nodes and aggregate information through multi-head attention mechanisms, and generate customer group association feature vectors. The transaction behavior time-series feature vector and the customer group association feature vector are concatenated and input into the feature fusion gating unit. The contribution ratio of the two features is dynamically adjusted through learnable gating weights to generate preliminary fused features. The initial fused features are input into a fully connected layer for nonlinear transformation and dimensionality compression to obtain a risk embedding vector that includes individual behavioral features and group association features.
9. A storage medium, characterized in that, The storage medium stores a computer program, wherein the computer program is configured to execute the method of any one of claims 1-5 when it is run.
10. An electronic device comprising a memory and a processor, characterized in that, The memory stores a computer program, and the processor is configured to run the computer program to perform the method of any one of claims 1-5.