A method for generating a list of case priorities based on historical payback
By performing multi-stage feature engineering and multi-model integrated prediction on historical case data, a case priority list is generated, which solves the problem of unreasonable resource allocation in traditional sorting methods and achieves efficient asset recovery.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- FAZUIYUN (XIAMEN) TECH CO LTD
- Filing Date
- 2026-05-14
- Publication Date
- 2026-06-12
AI Technical Summary
In existing technologies, the prioritization of overdue debt cases relies on traditional manual experience or static rules, failing to deeply explore multi-dimensional dynamic characteristics, resulting in unreasonable resource allocation, insufficient accuracy in assessing repayment potential, and affecting asset recovery efficiency.
By acquiring historical case data, performing multi-stage feature engineering processing, constructing a dynamic multi-dimensional relationship graph, combining multi-model integrated prediction, generating a case priority list, and iteratively optimizing the model in real time to improve assessment accuracy and resource matching efficiency.
It significantly improves the accuracy of case recovery potential assessment, achieves optimal matching of resources and recovery potential, increases the recovery rate per unit of resources and the overall asset recovery rate, adapts to changes in data distribution, and maintains long-term disposal efficiency.
Smart Images

Figure CN122199138A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of financial data processing technology, specifically to a method for generating a priority list of cases based on historical repayment data. Background Technology
[0002] In the fields of consumer finance, credit, and accounts receivable management, the handling of overdue debt cases is a crucial step in ensuring asset security and improving the efficiency of capital recovery. Since resources for mediation, collection, and other disposal methods (such as manpower, time, and material costs) are always limited, how to scientifically prioritize a massive number of overdue cases and accurately match limited resources to cases with higher recovery potential has become a core need commonly faced by the industry.
[0003] Currently, the mainstream case prioritization methods in the industry still rely primarily on traditional manual experience or simple static rules. Specifically, this means ranking cases based on only a few fixed and static indicators such as the amount owed, the length of overdue payment, and the debtor's basic credit score, without deeply exploring and utilizing the multi-dimensional dynamic characteristics behind the cases. For example, traditional methods ignore the temporal changes in debtors' repayment behavior, the impact of relationships between debtors on repayment, and key factors such as the actual difficulty and adjustability of the case. This leads to significant deviations in the assessment of the true repayment potential of each case, making it impossible to accurately distinguish between cases with high and low repayment potential.
[0004] This lack of accuracy in assessment directly leads to an unreasonable allocation of mediation resources—a large amount of high-quality resources are invested in cases with low recovery potential, while cases with high recovery potential fail to receive sufficient resource support, ultimately resulting in low disposal efficiency and poor asset recovery, failing to meet the industry's actual needs for efficient disposal of overdue assets. Summary of the Invention
[0005] The purpose of this invention is to provide a method for generating a case priority list based on historical repayment data, in order to solve the problems of existing cases being unable to accurately assess the repayment potential, resulting in inefficient allocation of mediation resources and low asset recovery efficiency.
[0006] To achieve the above objectives, the present invention adopts the following technical solution: A method for generating a case priority list based on historical payment records includes the following steps: S1. Obtain historical case dataset; S2. Perform multi-stage feature engineering on the historical case dataset, sequentially executing data reconstruction, relationship graph construction, feature fusion and filtering steps to obtain a feature subset for repayment prediction; S3. Perform multi-model integrated prediction based on the feature subset to calculate the comprehensive recovery potential score of the case; S4. Based on the comprehensive recovery potential score, prioritize cases by dynamically determined grouping thresholds and generate a case priority list. S5. Monitor the execution feedback data of the case priority list in real time, and iteratively optimize the prediction model and priority strategy based on the execution feedback data to form a closed-loop optimization.
[0007] Preferably, the historical case dataset includes basic case information, debtor information, repayment behavior timeline information, external multi-source credit information, correlation data information, and unstructured communication record information.
[0008] Preferably, step S2 specifically includes: S21. Use time series decomposition algorithm to extract the trend, periodicity and stability features of repayment behavior, and transform unstructured communication records into structured behavior encoding sequences; S22. Construct explicit association graph layers and implicit association graph layers and merge them hierarchically to form a dynamic multi-dimensional relationship graph. Capture the temporal evolution of association relationships through a time-series graph neural network and extract the evolution characteristics of debtor association strength. S23. Cross-domain fusion of temporal features, graph features and traditional features, calculate the contribution weight of each feature to the repayment prediction target through an attention mechanism, sort the features according to the contribution weight, and select features with weights higher than the dynamic threshold to form the feature subset.
[0009] Preferably, the formula for calculating the dynamic threshold is: , , ,in, Dynamic threshold for feature selection Redundancy weighting coefficient; R: Normalized value of redundancy between features. Total number of candidate features : No. One characteristic, Pearson correlation coefficient; : Absolute value of correlation coefficient Normalized value of model stability index The weight of the k-th feature after 10 bootstrapping iterations. :variance.
[0010] Preferably, step S3 specifically includes: S31. The gradient boosting decision tree model is used to output the initial repayment probability prediction value, the deep neural network model is used to output the repayment amount range prediction value, and the graph neural network model is used to output the adjustability score. S32. The initial repayment probability prediction, repayment amount range prediction, and adjustability score are normalized, and then the initial comprehensive score is calculated using a dynamic weighted fusion algorithm. ,in, : Dynamic weighting coefficients for the initial probability of repayment prediction. : Dynamic weighting coefficients for the predicted repayment amount range : Dynamic weighting coefficients for adjustability scores Initial probability of repayment predicted value. : Normalized value of the repayment amount range forecast Adjustability score; S33. Use the Platt scaling method or ordinal regression method to perform probability calibration on the initial value of the comprehensive score to obtain the final comprehensive repayment potential score.
[0011] Preferably, step S4 specifically includes: S41. Based on the comprehensive repayment potential score and historical repayment data, construct a relationship curve between the score and the repayment probability. According to the current total amount of available mediation resources and the expected repayment target, select the optimal cutting point on the relationship curve as the grouping threshold. S42. Divide the cases into different priority groups according to the grouping threshold, and sort them within each group according to multiple criteria; S43. Analyze the weighted composition of the comprehensive recovery potential score, identify the feature dimension with the highest contribution, and generate targeted mediation strategy suggestions based on the identified feature dimension. For cases with scores below a certain threshold, automatically recommend alternative disposal plans and estimate the expected recovery rate of each plan. S44. Combine intelligent mediation assistance to generate a priority list of cases with strategic suggestions.
[0012] Preferably, the optimal split point in step S41 aims to maximize the expected return amount per unit of mediation resources under the current resource constraints. The mathematical expression of the optimization objective is as follows: The constraints are: ,in, , Grouping threshold to be optimized : No. One case, : No. The overall recovery potential score for each case. : No. The actual recovery probability of each case : No. The total amount owed in each case. : Processing scores The total resources required for the case. : Processing the Resource consumption per case Total amount of mediation resources currently available.
[0013] Preferably, in step S42, a multi-criteria decision-making method is used within each priority group to calculate the comprehensive utility value to achieve fine ranking. The formula for calculating the comprehensive utility value is as follows: , ,in, : No. The overall utility value of each case : Weighting of overall repayment potential score Case amount weighting Urgency weighting : No. A comprehensive recovery potential score for each case; : No. The total amount owed in each case. The largest outstanding debt in the current case pool. : No. Normalized value of the urgency of each case. : The number of days remaining for effective processing of the case.
[0014] Preferably, step S5 specifically includes: S51. Monitor the execution feedback data of the case priority list in real time, and collect information on actual repayment, resource consumption and mediation results; S52. Retrain the prediction model periodically and update the feature selection strategy based on the execution feedback data, and perform cross-cycle knowledge transfer during the model retraining process. S53. The adjustment process of the grouping threshold and ranking criterion weights is modeled as a sequential decision problem. The total amount of repayments within a preset future period is used as the reward function. Adjustments are periodically triggered based on real-time feedback data. The decision strategy is optimized using a policy gradient algorithm. The expression for the reward function is: ,in, Preset cycle The reinforcement learning reward value within, It can be set to 30 days or 60 days. Weighting coefficient for the amount of payment received. :cycle The actual total amount received within the period, Resource consumption penalty coefficient :cycle Total mediation resource consumption within the organization; S54. Establish an early warning mechanism based on model performance degradation indicators to automatically trigger the model retraining process and form a closed loop of continuous iterative strategy optimization.
[0015] Preferably, the cross-cycle knowledge transfer in step S52 specifically refers to: The parameters of the gradient boosting decision tree model, deep neural network model, and graph neural network model in the historical training cycles are used as the initialization parameters for the training of the corresponding model in the new cycle. During retraining, an adversarial domain adaptation method is used to reduce the data distribution difference between historical and new periods, thereby mitigating the impact of data distribution changes on model prediction performance. The total loss function of the model is: , ,in, The model's total loss function. Task loss: Cross-entropy loss is used for classification tasks, and mean squared error loss is used for regression tasks. Domain-adaptive weight coefficients Domain-specific loss determination Domain discriminator The feature vector output by the feature extractor. Domain tags, historical period samples =0, new period sample =1.
[0016] By adopting the above technical solution, the present invention has the following advantages compared with the prior art: 1. This invention provides a method for generating a case priority list based on historical repayment data. By performing multi-stage feature engineering on historical case data, it fully mines in-depth information such as repayment time sequence characteristics and debtor relationships. Compared with the traditional sorting method that relies solely on static indicators, it significantly improves the accuracy of case repayment potential assessment and can more accurately identify high-value repayment cases.
[0017] 2. This invention provides a method for generating a priority list of cases based on historical repayment data. It integrates multiple models for prediction and combines resource constraints to optimize grouping thresholds. This method can achieve optimal matching of resources and repayment potential under limited mediation resources, avoid resource waste, and effectively improve the repayment revenue per unit of resources and the overall asset recovery rate.
[0018] 3. This invention provides a method for generating a priority list of commissioned cases based on historical payment records. It constructs a closed-loop iterative optimization mechanism based on actual commissioned case execution data, improves the cross-cycle stability of the model through cross-cycle knowledge transfer and adversarial domain adaptation, and continuously optimizes the priority strategy using reinforcement learning, enabling the system to adapt to changes in data distribution and maintain high processing efficiency in the long term. Attached Figure Description
[0019] Figure 1 This is a flowchart of the method of the present invention. Detailed Implementation
[0020] To make the objectives, technical solutions, and advantages of this invention clearer, the invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative and not intended to limit the invention.
[0021] Example Please refer to Figure 1 As shown, this invention discloses a method for generating a priority list of commissioned projects based on historical payment records, comprising the following steps: S1. Obtain historical case datasets, which include, but are not limited to, basic case information, debtor information, repayment behavior timeline information, external multi-source credit information, related relationship data information, and unstructured communication record information.
[0022] Basic case information includes: unique case identifier, contract number, loan product type, overdue stage (e.g., M1, M2), collection date, principal balance, interest and penalty interest amount, etc.
[0023] Debtor information: such as the debtor's age, gender, occupation, education level, income range, historical credit score and other static profile information.
[0024] Repayment chronology information: a sequence of historical repayment records with a time granularity of days or weeks, detailing the date, amount, and method of each repayment (such as proactive repayment or repayment after collection).
[0025] External multi-source credit information: Third-party data obtained through compliant channels, such as summaries of the central bank's credit report (including debt and overdue records), loan records from other financial institutions, litigation information, administrative penalty information, etc.
[0026] Related party data: Clearly recorded debt-related information, such as a list of co-borrowers, guarantor information and their relationship with the guaranteed debtor, emergency contact information, etc.
[0027] Unstructured communication records: All interaction records with the debtor and their associates, including transcripts of recorded calls, text messages, online customer service chat logs, etc.
[0028] S2. Perform multi-stage feature engineering on the historical case dataset, sequentially executing data reconstruction, relationship graph construction, feature fusion and filtering steps to obtain a feature subset for repayment prediction.
[0029] S21. Use time series decomposition algorithm to extract the trend, periodicity and stability features of repayment behavior, and transform unstructured communication records into structured behavior encoding sequences.
[0030] Temporal Feature Extraction: Time series decomposition algorithms are used to process the temporal information of repayment behavior. For example, the monthly repayment amount sequence of the past 12 months is decomposed into trend, periodic and residual terms, from which trend features (such as the slope of the repayment trend line in the last 6 months), periodic features (such as the identified quarterly or monthly repayment patterns) and stability features (such as the variance of the residual sequence) are extracted.
[0031] The time series decomposition algorithm uses the STL decomposition algorithm, and its specific steps are as follows: 1) Input: Time sequence of repayment behavior , The time step is 365 days.
[0032] 2) Set decomposition parameters: trend window length The default value is 15, and it is an odd number. It is adjusted according to the timing length and the period window length. The default value is 7, corresponding to the weekly cycle.
[0033] 3) Iterative decomposition: a. To Perform local weighted regression (LOESS) smoothing to obtain the trend component. ; b. From Remove from To obtain the detrended sequence ; c. To Perform periodic smoothing to obtain periodic components. ; d. Residual components ; 4) Output: Trend characteristics ( slope ), periodic characteristics ( Standard deviation Stability characteristics variance ).
[0034] Behavioral coding sequence generation: Natural language processing is applied to unstructured communication records. First, a set of standard behavioral tags is defined (e.g., initial contact, promise to repay, difficulty expressing oneself, refusal to communicate, negotiation of solutions, etc.). Then, each communication record is categorized into the corresponding tag according to its content, and a structured behavioral coding sequence is generated in chronological order. Quantitative features such as the actual fulfillment rate after a promise, the transition probability between different behavioral states, and the frequency of specific behavioral patterns can be extracted from the behavioral coding sequence.
[0035] S22. Construct explicit association graph layers and implicit association graph layers and merge them hierarchically to form a dynamic multi-dimensional relationship graph. Capture the temporal evolution of association relationships through a time-series graph neural network and extract the evolution characteristics of debtor association strength.
[0036] Construct an explicit relationship graph layer: Based on strong relationship data with legal or contractual basis, such as guarantee relationships and joint loan relationships, construct the first layer graph. Nodes represent debtors or related parties, edges represent explicit relationships, and edge attributes can include relationship type, related debt amount, etc.
[0037] Constructing a latent association graph layer: Based on weak association signals such as communication network overlap (e.g., the number and proportion of overlapping contacts in the address books of two debtors) and address history overlap (e.g., whether they have used the same home or work address), a second-layer graph is constructed. The existence and strength of edges are determined by calculating similarity indices (e.g., Jaccard similarity coefficient), filtering out weak associations below a preset threshold.
[0038] Hierarchical Fusion and Dynamic Feature Extraction: The explicit and implicit relationship graph layers are hierarchically fused to form a unified, dynamic, multi-dimensional relationship graph. A temporal graph neural network is then used to process this graph. This network effectively handles the addition, deletion, and changes in nodes and edges over time (e.g., the termination of guarantee relationships or weakening of communication connections). Through the learning of the temporal graph neural network, the evolutionary features of the relationship strength of each debtor node are extracted (e.g., the trend of the number of core related nodes over the past 90 days, the moving average of the average credit scores of related parties, etc.).
[0039] S23. Cross-domain fusion of time-series features, graph features and traditional features, calculate the contribution weight of each feature to the repayment prediction target through the attention mechanism, sort the features according to the contribution weight, and select features with weights higher than the dynamic threshold to form a feature subset.
[0040] Construct a cross-domain feature pool: gather traditional static features from case basic information and debtor information, temporal and behavioral features extracted by S21, and graph structured features extracted by S22 (such as degree centrality of nodes, modularity of the community to which they belong, and shortest path length to a specific type of node) into a high-dimensional feature pool.
[0041] Dynamic feature selection based on attention mechanism: A cross-domain feature pool is input into a feature selection model based on attention mechanism. Specifically: The attention-based feature selection model includes a multi-head attention layer. The features themselves serve as inputs to the query, key, and value. Through attention calculation, the contribution weight of each feature to the predefined repayment prediction task (binary label classification) is obtained. The weight reflects the strength of the correlation between the feature and the target.
[0042] All features are sorted in descending order of contribution weight. A dynamic threshold is set for filtering. This dynamic threshold is not a fixed value but is determined through a joint optimization process: it comprehensively considers the redundancy between features (e.g., the average Pearson correlation coefficient between the selected feature subset and candidate features; high redundancy increases the threshold to filter redundant information) and model stability indicators (e.g., the variance of feature weights calculated multiple times using the bootstrap sampling method; large variance indicates poor stability, requiring threshold adjustment). By weighting these two indicators, a dynamic threshold that makes the feature subset both effective and stable is determined. The formula for calculating the dynamic threshold is: , , ,in, The dynamic threshold for feature selection has a range of values. ; Redundancy weighting coefficient, with a value range of [value missing]. Determined through cross-validation (default value 0.4, can be adjusted according to data characteristics); R: Normalized value of redundancy between features, ranging from... ; Total number of candidate features : No. One characteristic, Pearson correlation coefficient; : Absolute value of the correlation coefficient; : Normalized value of model stability index, with a range of values of ; The weight of the k-th feature after 10 bootstrapping iterations. :variance.
[0043] Finally, features with contribution weights greater than the dynamic threshold are selected to form the feature subset used in this model training.
[0044] S3. Perform multi-model training and ensemble prediction tasks, and output the initial repayment probability prediction, repayment amount range prediction and adjustability score of the case in parallel. Then, perform weighted fusion and probability calibration on the initial repayment probability prediction, repayment amount range prediction and adjustability score to calculate the comprehensive repayment potential score.
[0045] S31. The gradient boosting decision tree model is used to output the initial repayment probability prediction value, the deep neural network model is used to output the repayment amount range prediction value, and the graph neural network model is used to output the adjustability score. Gradient boosting decision tree models (such as LightGBM) use the feature subset selected in step S2 as input features and supervised training using whether historical cases will make payments within a subsequent observation period (e.g., the next 30 days) as binary classification labels. The gradient boosting decision tree model outputs an initial payment probability prediction for each case to be predicted.
[0046] The deep neural network model uses the same subset of features as input and the actual repayment amount of historical cases during the observation period as continuous value labels for regression training. The deep neural network model outputs a range prediction of the repayment amount for each case. For example, it also outputs the P10, P50 (median), and P90 quantiles of the predicted amount to represent the distribution of the repayment amount.
[0047] The graph neural network model uses the dynamic multi-dimensional relationship graph constructed in step S2 as the input structure, with node features including some attributes of the debtor. Historical cases where settlements were reached and funds were recovered through negotiation or mediation are used as positive samples to train the model and assess its remediability. The graph neural network model aggregates neighbor information through a message passing mechanism, analyzes the debtor's position, connectivity, and associated node attributes within the relationship network, and ultimately outputs a remediability score. A higher score indicates a greater likelihood of recovery through mediation.
[0048] S32. Normalize the initial repayment probability prediction, repayment amount range prediction, and adjustability score to make them comparable. Then, use a dynamic weighted fusion algorithm to calculate the initial comprehensive score: ,in, : The dynamic weighting coefficient of the initial repayment probability prediction value, with a value range of . The weights are positively correlated with the AUC value of the gradient boosting decision tree model on the latest validation set, and satisfy the following conditions: ; : Dynamic weighting coefficients for the predicted repayment amount range, with a value range of The weight magnitude is exponentially positively correlated with the negative mean squared error (MSE) of the deep neural network model on the latest validation set; The dynamic weighting coefficient for adjustability scoring, with a value range of [value missing]. The weights are positively correlated with the AUC value of the graph neural network model on the latest validation set, and satisfy the following conditions: ; The initial predicted probability of repayment is output by a gradient boosting decision tree model (such as LightGBM), and its value range is [value range missing]. This indicates the probability that a payment will be made within a pre-set observation period (e.g., 30 days). : Normalized value of the repayment amount range prediction, with a value range of The original values are output by the deep neural network model (such as the distribution of repayment amount represented by P10, P50, and P90 quantiles), and after normalization, they are transformed into comparable dimensionless indicators that reflect the relative level of the expected repayment amount of the case. Adjustability score, output by a graph neural network model, with a value range of [value missing]. The score indicates the likelihood of recovering funds through mediation; a higher score indicates a higher expected success rate for mediation.
[0049] Each weight is dynamically adjusted based on the real-time performance metrics (AUC value or mean squared error) of the corresponding model. The better the performance (higher AUC value and smaller mean squared error), the higher the weight ratio, ensuring that the model with higher prediction accuracy contributes more to the overall score.
[0050] S33, Due to the initial value of the integrated score after fusion The score may deviate from the actual probability of repayment. Therefore, it is calibrated using Platt scaling (logistic regression calibration) or ordinal-preserving regression. The final score obtained after calibration is the comprehensive repayment potential score, which can more accurately be interpreted as the expected probability of the case receiving repayment in the future.
[0051] S4. Based on the comprehensive recovery potential score, the cases are divided into different priority groups through dynamically determined grouping thresholds, and within each group, they are sorted according to multiple criteria including score value, case amount and case urgency, generating a case priority list with dynamic mediation strategy suggestions.
[0052] S41. Based on the comprehensive repayment potential score and historical repayment data, construct a curve showing the relationship between the score and the repayment probability. Based on the total amount of available mediation resources and the expected repayment target, select the optimal dividing point on the curve as the grouping threshold. S42. Divide cases into different priority groups according to grouping thresholds, and sort them within each group according to multiple criteria; S43. Analyze the weighted composition of the comprehensive recovery potential score, identify the feature dimension with the highest contribution, and generate targeted mediation strategy suggestions based on the identified feature dimension. For cases with scores below a certain threshold, automatically recommend alternative disposal plans and estimate the expected recovery rate of each plan. S44. Combine intelligent mediation assistance to generate a priority list of cases with strategic suggestions.
[0053] In step S41, a monotonic curve is plotted between the comprehensive collection potential score and the actual collection rate based on historical data. It is assumed that the total amount of resources currently available for mediation (such as manpower and working hours) is... The goal is to find one or more optimal split points (thresholds) on the curve. Cases are categorized into high, medium, and low priority groups. The optimal segmentation point aims to maximize the expected recovery amount per unit of mediation resources under current resource constraints. The mathematical expression of the optimization objective is:
[0054] The constraints are: ,in, , Grouping threshold to be optimized : No. One case; : No. A comprehensive recovery potential score for each case; : No. The actual recovery probability of each case is obtained from historical data statistics, that is, the actual recovery rate of cases in the same scoring range; : No. The total amount owed in each case (principal + interest + penalty interest); : Processing scores The total resources required for the case, such as manpower and working hours; : Processing the The unit resource consumption of each case is determined by statistical analysis of historical execution data; The total amount of mediation resources currently available, such as the total number of collection hours available each month.
[0055] In the specific solution, a grid search method is used for traversal. (Step size 0.01), select the option that satisfies the constraints. And the objective function value is the largest As the optimal grouping threshold.
[0056] In step S42, within each priority group, a multi-criteria decision-making method is used for fine-grained ranking. Appropriate initial weights are assigned to each criterion, and the ranking is determined by calculating the comprehensive utility value of each case. The formula for calculating the comprehensive utility value is as follows: , ,in, : No. The overall utility value of each case, ranging from [value range missing]. The higher the score, the higher the ranking. The weight of the overall repayment potential score is 0.5, which can be optimized and adjusted through reinforcement learning. Case amount weight, default value 0.3; Urgency weight, default value 0.2, and meets the following conditions: ; : No. A comprehensive recovery potential score for each case; : No. The total amount owed in each case. The largest outstanding amount in the current case pool; : No. The normalized value of the urgency of each case, with a range of values of . ; The remaining effective days for processing the case, such as the number of days until the statute of limitations expires.
[0057] The core criteria of the multi-criteria decision-making method in this embodiment include: 1) Overall collection potential score (the higher the better); 2) The total amount involved in the case (the higher the better); 3) The urgency of the case (such as whether it is nearing the statute of limitations, whether there are major complaints, etc., which require priority handling).
[0058] In S43, for each case, the analysis determines which model or feature dimension contributes the highest weight during the fusion calculation of the comprehensive repayment potential score. For example, if the adjustability score contributes significantly and its underlying key feature is a close relationship between the debtor and creditworthy relatives, the recommended approach is to focus on indirectly persuading and pressuring the debtor through these creditworthy relatives.
[0059] For cases with scores below a certain threshold (such as the low-priority group), the system automatically matches and recommends alternative solutions. For example, if a case simultaneously features a debtor who is unreachable and multiple enforcement records shown in external credit reports, the system recommends: directly initiating legal proceedings, with an estimated recovery rate of approximately 15%-25%. Simultaneously, the system can estimate the expected recovery rate range for each recommended solution based on the outcomes of similar historical cases.
[0060] Step S4 also includes providing intelligent assistance in the mediation process based on the generated case priority list, specifically: Script and strategy generation: Based on the feature vectors in the case feature subset, the system retrieves successful scripts and strategies from similar historical cases from the preset mediation knowledge base, or uses a natural language generation model to generate communication points that fit the characteristics of the current case.
[0061] Real-time interactive analysis: During the communication between the mediator and the debtor, the system can access the voice-to-text or online chat text in real time, use the sentiment analysis model to judge the debtor's emotions (such as anger, anxiety, cooperation), use the intent recognition model to judge the debtor's true intentions (such as delay, sincere negotiation, inability to repay), and output strategy adjustment parameters in real time (such as suggesting pressure, suggesting appeasement, suggesting installment plan, etc.).
[0062] Knowledge Accumulation and Closed Loop: The complete mediation interaction process, the strategies adopted, and the final results (such as repayment amount and repayment date) are structured and stored in the mediation knowledge base. This fresh data with result labels will serve as high-quality incremental training samples for subsequent model updates.
[0063] S5. Monitor the execution feedback data of the case priority list in real time, and retrain the prediction model and update the feature selection strategy periodically based on the feedback data. At the same time, optimize the weights of the grouping threshold and ranking criteria through reinforcement learning algorithm to form a continuous iterative strategy optimization loop.
[0064] S51. Monitor the execution feedback data of the case priority list in real time, and collect information on actual repayment, resource consumption and mediation results, including actual contact results, implementation of mediation strategies, whether repayment has been made, repayment amount, resource consumption, etc.
[0065] S52. Based on the execution feedback data, periodically retrain the prediction model and update the feature selection strategy, and perform cross-cycle knowledge transfer during the model retraining process.
[0066] The cross-cycle knowledge transfer in step S52 specifically involves: Regular model retraining: At fixed intervals (e.g., weekly or monthly), new feedback data is used as incremental data to re-execute steps S2 and S3, updating the parameters of the gradient boosting decision tree model, deep neural network model, and graph neural network model. At the start of retraining, a cross-cycle knowledge transfer strategy is adopted, using the model parameters obtained in the previous training cycle as the initialization parameters for this training cycle to accelerate convergence and maintain knowledge continuity.
[0067] Adversarial Domain Adaptation: During retraining, a domain discriminator and a gradient reversal layer are introduced. By jointly optimizing the task loss and the domain discriminant loss, the features learned by the feature extractor are made as indistinguishable as possible between samples from historical periods and new periods. This reduces the impact of data distribution differences caused by changes in market environment and policies on model performance. The total loss function of the model is: , ,in, The model's total loss function; Task loss: cross-entropy loss is used for classification tasks, and mean squared error loss is used for regression tasks. Domain-adaptive weight coefficient, default value 0.3, tuned through validation set; Domain-specific loss determination Domain discriminator Feature vectors output by the feature extractor: Domain tags, historical period samples New cycle sample ; Gradient inversion layer during backpropagation The gradient is inverted, i.e. This ensures that the feature extractor learns domain-independent features.
[0068] S53. The adjustment process of the grouping threshold and the weight vector of the intra-group ranking criteria is modeled as a sequential decision problem. Case pool characteristics and resource status are considered as system states, threshold and weight adjustments as actions, and the actual revenue within a preset future period as the reward. Using a policy gradient algorithm (such as REINFORCE or PPO), the decision policy network is continuously optimized based on long-term accumulated sequential data of states, actions, and rewards. This achieves automated, intelligent, and dynamic adjustment of the grouping threshold and ranking weights, enabling the entire system to adapt to business changes. The reward function expression is: ,in, Preset cycle The reinforcement learning reward value within, It can be set to 30 days or 60 days, which can be adjusted according to the business scenario; : Weighting coefficient for repayment amount, default value 1.0; :cycle The actual total amount received within the period; Resource consumption penalty coefficient, default value 0.1, can be adjusted according to resource cost; :cycle Total mediation resources consumed within the organization.
[0069] S54. Establish an early warning mechanism based on model performance degradation indicators to automatically trigger the model retraining process and form a closed loop of continuous iterative strategy optimization.
[0070] The system monitors the model's core performance metrics (such as AUC) on the latest validation set. When performance degradation exceeds a preset threshold (such as an AUC drop of more than 0.03), an early warning mechanism is triggered, and emergency procedures are automatically executed, such as adjusting the optimizer learning rate, introducing more incremental samples, or starting a completely new training process in extreme cases.
[0071] The above are merely preferred embodiments of the present invention, but the scope of protection of the present invention is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the scope of the technology disclosed in the present invention should be included within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the scope of the claims.
Claims
1. A method for generating a priority list of cases based on historical payment records, characterized in that, Includes the following steps: S1. Obtain historical case dataset; S2. Perform multi-stage feature engineering on the historical case dataset, sequentially executing data reconstruction, relationship graph construction, feature fusion and filtering steps to obtain a feature subset for repayment prediction; S3. Perform multi-model integrated prediction based on the feature subset to calculate the comprehensive recovery potential score of the case; S4. Based on the comprehensive recovery potential score, prioritize cases by dynamically determined grouping thresholds and generate a case priority list. S5. Monitor the execution feedback data of the case priority list in real time, and iteratively optimize the prediction model and priority strategy based on the execution feedback data to form a closed-loop optimization.
2. The method for generating a case priority list based on historical payment records as described in claim 1, characterized in that: The historical case dataset includes basic case information, debtor information, repayment behavior timeline information, external multi-source credit information, related relationship data information, and unstructured communication record information.
3. The method for generating a case priority list based on historical payment records as described in claim 2, characterized in that, Step S2 is as follows: S21. Use time series decomposition algorithm to extract the trend, periodicity and stability features of repayment behavior, and transform unstructured communication records into structured behavior encoding sequences; S22. Construct explicit association graph layers and implicit association graph layers and merge them hierarchically to form a dynamic multi-dimensional relationship graph. Capture the temporal evolution of association relationships through a time-series graph neural network and extract the evolution characteristics of debtor association strength. S23. Cross-domain fusion of temporal features, graph features and traditional features, calculate the contribution weight of each feature to the repayment prediction target through an attention mechanism, sort the features according to the contribution weight, and select features with weights higher than the dynamic threshold to form the feature subset.
4. The method for generating a case priority list based on historical payment records as described in claim 3, characterized in that, The formula for calculating the dynamic threshold is: , , ,in, Dynamic threshold for feature selection Redundancy weighting coefficient; R: Normalized value of redundancy between features. Total number of candidate features , : No. One characteristic, Pearson correlation coefficient; : Absolute value of correlation coefficient Normalized value of model stability index The weight of the k-th feature after 10 bootstrapping iterations. :variance.
5. The method for generating a case priority list based on historical payment records as described in claim 2, characterized in that, Step S3 is as follows: S31. The gradient boosting decision tree model is used to output the initial repayment probability prediction value, the deep neural network model is used to output the repayment amount range prediction value, and the graph neural network model is used to output the adjustability score. S32. The initial repayment probability prediction, repayment amount range prediction, and adjustability score are normalized, and then the initial comprehensive score is calculated using a dynamic weighted fusion algorithm. ,in, : Dynamic weighting coefficients for the initial probability of repayment prediction. : Dynamic weighting coefficients for the predicted repayment amount range : Dynamic weighting coefficients for adjustability scores Initial probability of repayment predicted value. : Normalized value of the repayment amount range forecast Adjustability score; S33. Use the Platt scaling method or ordinal regression method to perform probability calibration on the initial value of the comprehensive score to obtain the final comprehensive repayment potential score.
6. The method for generating a case priority list based on historical payment records as described in claim 2, characterized in that, Step S4 specifically includes: S41. Based on the comprehensive repayment potential score and historical repayment data, construct a relationship curve between the score and the repayment probability. According to the current total amount of available mediation resources and the expected repayment target, select the optimal cutting point on the relationship curve as the grouping threshold. S42. Divide the cases into different priority groups according to the grouping threshold, and sort them within each group according to multiple criteria; S43. Analyze the weighted composition of the comprehensive recovery potential score, identify the feature dimension with the highest contribution, and generate targeted mediation strategy suggestions based on the identified feature dimension. For cases with scores below a certain threshold, automatically recommend alternative disposal plans and estimate the expected recovery rate of each plan. S44. Combine intelligent mediation assistance to generate a priority list of cases with strategic suggestions.
7. The method for generating a case priority list based on historical payment records as described in claim 6, characterized in that: The optimal split point mentioned in step S41 aims to maximize the expected return amount per unit of mediation resources under the current resource constraints. The mathematical expression of the optimization objective is as follows: The constraints are: ,in, , Grouping threshold to be optimized : No. One case, : No. The overall recovery potential score for each case. : No. The actual recovery probability of each case : No. The total amount owed in each case. : Processing scores The total resources required for the case. : Processing the Resource consumption per case Total amount of mediation resources currently available.
8. The method for generating a case priority list based on historical payment records as described in claim 6, characterized in that: In step S42, a multi-criteria decision-making method is used within each priority group to calculate the comprehensive utility value to achieve fine ranking. The formula for calculating the comprehensive utility value is as follows: , ,in, : No. The overall utility value of each case : Weighting of overall repayment potential score Case amount weighting Urgency weighting : No. A comprehensive recovery potential score for each case; : No. The total amount owed in each case. The largest outstanding debt in the current case pool. : No. Normalized value of the urgency of each case. : The number of days remaining for effective processing of the case.
9. The method for generating a case priority list based on historical payment records as described in claim 2, characterized in that, Step S5 is as follows: S51. Monitor the execution feedback data of the case priority list in real time, and collect information on actual repayment, resource consumption and mediation results; S52. Retrain the prediction model periodically and update the feature selection strategy based on the execution feedback data, and perform cross-cycle knowledge transfer during the model retraining process. S53. The adjustment process of the grouping threshold and ranking criterion weights is modeled as a sequential decision problem. The total amount of repayments within a preset future period is used as the reward function. Adjustments are periodically triggered based on real-time feedback data. The decision strategy is optimized using a policy gradient algorithm. The expression for the reward function is: ,in, Preset cycle The reinforcement learning reward value within, It can be set to 30 days or 60 days. Weighting coefficient for the amount of payment received. :cycle The actual total amount received within the period, Resource consumption penalty coefficient :cycle Total mediation resource consumption within the organization; S54. Establish an early warning mechanism based on model performance degradation indicators to automatically trigger the model retraining process and form a closed loop of continuous iterative strategy optimization.
10. The method for generating a case priority list based on historical payment records as described in claim 9, characterized in that: The cross-cycle knowledge transfer mentioned in step S52 specifically refers to: The parameters of the gradient boosting decision tree model, deep neural network model, and graph neural network model in the historical training cycles are used as the initialization parameters for the training of the corresponding model in the new cycle. During retraining, an adversarial domain adaptation method is used to reduce the data distribution difference between historical and new periods, thereby mitigating the impact of data distribution changes on model prediction performance. The total loss function of the model is: , ,in, The model's total loss function. Task loss: Cross-entropy loss is used for classification tasks, and mean squared error loss is used for regression tasks. Domain-adaptive weight coefficients Domain-specific loss determination Domain discriminator The feature vector output by the feature extractor. Domain tags, historical period samples =0, new period sample =1.