A self-adaptive optimization method and system for financial data object label management
By optimizing the tag management framework for financial data objects using reinforcement learning algorithms, this approach solves the problem that existing tag management methods struggle to adapt to business changes, achieving efficient matching between tags and business requirements.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- DERKEE INFORMATION CO LTD
- Filing Date
- 2026-03-13
- Publication Date
- 2026-06-19
AI Technical Summary
Existing methods for managing financial data object labels are ill-suited to adapt to business changes, resulting in a mismatch between label granularity and business needs. They also lack adaptive learning capabilities and cannot dynamically adjust label weights and classification logic.
The algorithm employs reinforcement learning and is based on an initial label management framework. It generates the initial label management framework by receiving user-defined configurable label structure parameters and rule parameters, collects business feedback data to construct a reinforcement learning state space, and optimizes label weights and classification boundaries.
It has achieved adaptive optimization of the financial data object labeling system, improving the accuracy and timeliness of matching labels with actual business needs.
Smart Images

Figure CN122241468A_ABST
Abstract
Description
Technical Field
[0001] This invention belongs to the field of financial technology, specifically an adaptive optimization method and system for financial data object tag management. Background Technology
[0002] With the deepening of digital transformation in financial services, massive amounts of financial data objects (such as transaction records, wealth management products, and customer information) urgently need to be managed and used efficiently through a tagging system. Current tagging management methods in the financial sector largely rely on static rules or expert experience, with manual pre-definition of tag classification and judgment rules followed by batch tagging of data. However, financial services are highly dynamic and scenario-dependent; the same data object may require different tag interpretations in different business contexts (such as risk monitoring, targeted marketing, and compliance auditing). Existing static tagging systems struggle to respond promptly to business changes, leading to a mismatch between tag granularity and business needs. Furthermore, hard-coded rules lack adaptive learning capabilities, failing to dynamically correct tag boundaries using feedback data generated during business applications. Therefore, how to enable a tagging system for financial data objects to have adaptive optimization capabilities, continuously adjusting tag weights and classification logic based on actual business feedback, has become a pressing technical problem to be solved in the field of financial data governance. Summary of the Invention
[0003] The purpose of this invention is to provide an adaptive optimization method and system for financial data object tag management, in order to overcome the shortcomings of the existing technology, realize the adaptive evolution of the financial data object tag system, and improve the accuracy and timeliness of matching tags with actual business needs.
[0004] One embodiment of this application provides an adaptive optimization method for financial data object tag management, the method comprising: Receive user-defined configurable tag structure parameters and tag rule parameters, and generate an initial financial data object tag management framework based on the configurable tag structure parameters and tag rule parameters; Based on the initial financial data object tag management framework, the financial data objects to be processed are tagged to generate a set of financial data objects with initial tags. Collect business feedback data generated by the financial data objects with initial labels in corresponding business applications, and construct a reinforcement learning state space for optimizing the label system; Using reinforcement learning algorithms, the label weights and classification boundaries are adaptively adjusted and optimized based on the reinforcement learning state space to generate an optimized label rule set.
[0005] Another embodiment of this application provides an adaptive optimization system for financial data object tag management, the system comprising: The receiving module is used to receive user-defined configurable tag structure parameters and tag rule parameters, and generate an initial financial data object tag management framework based on the configurable tag structure parameters and tag rule parameters. The processing module is used to perform tagging processing on the financial data objects to be processed based on the initial financial data object tag management framework, and generate a set of financial data objects with initial tags. The module is used to collect business feedback data generated by the financial data objects with initial labels in corresponding business applications, and to construct a reinforcement learning state space for optimizing the label system. The optimization module is used to adaptively adjust and optimize the label weights and classification boundaries based on the reinforcement learning state space using a reinforcement learning algorithm, thereby generating an optimized label rule set.
[0006] Another embodiment of this application provides a storage medium storing a computer program, wherein the computer program is configured to execute the method described in any of the preceding claims when running.
[0007] Another embodiment of this application provides an electronic device including a memory and a processor, wherein the memory stores a computer program and the processor is configured to run the computer program to perform the method described in any of the preceding claims.
[0008] Compared with existing technologies, the adaptive optimization method for financial data object tag management provided by this invention can realize the adaptive evolution of the financial data object tag system and improve the accuracy and timeliness of matching tags with actual business needs. Attached Figure Description
[0009] Figure 1 A hardware structure block diagram of a computer terminal for an adaptive optimization method for financial data object tag management provided in an embodiment of the present invention; Figure 2 A flowchart illustrating an adaptive optimization method for managing financial data object tags, provided in an embodiment of the present invention; Figure 3 A flowchart illustrating another adaptive optimization method for financial data object tag management provided in an embodiment of the present invention; Figure 4 A flowchart illustrating an adaptive optimization method for managing financial data object tags, provided in an embodiment of the present invention; Figure 5 This is a schematic diagram of the structure of an adaptive optimization system for financial data object tag management provided in an embodiment of the present invention. Detailed Implementation
[0010] The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present invention, and should not be construed as limiting the present invention.
[0011] This invention first provides an adaptive optimization method for the management of financial data object tags. This method can be applied to electronic devices, such as computer terminals, specifically ordinary computers.
[0012] The following detailed explanation uses a computer terminal as an example. Figure 1 This is a hardware structure block diagram of a computer terminal for an adaptive optimization method for financial data object tag management provided in an embodiment of the present invention. Figure 1 As shown, the computer device includes a processor, memory, and network interface connected via a system bus, wherein the memory may include non-volatile storage media and internal memory.
[0013] See Figure 2 The present invention provides an adaptive optimization method for financial data object tag management, which may include the following steps: S201, Receive user-defined configurable tag structure parameters and tag rule parameters, and generate an initial financial data object tag management framework based on the configurable tag structure parameters and tag rule parameters; Specifically, the system can receive user-inputted tag structure parameters through a visual configuration interface, including tag hierarchy depth, tag category name, tag attribute fields and their data types, and generate the original tag structure parameter set. The core of this step is to provide users with an input point for tag structure parameters through a visual interactive interface, thereby enabling the visual collection and preliminary structured storage of the underlying structural parameters of the financial data object tag system, and generating the original tag structure parameter set. The specific implementation method is as follows: The visual configuration interface is a dedicated interactive interface for financial data tag management. The interface is divided into functional input modules according to parameter type. Each module features visual input boxes, drop-down selection boxes, hierarchical editing trees, and other interactive components. Users can complete parameter configuration through clicking, entering, and dragging, without writing code, adapting to the operating habits of financial business personnel. The tag hierarchy depth is the number of vertical levels in the tag system, represented by a positive integer. Users can set a specific value through the numerical input box. This parameter determines the granularity of the vertical division of the tag system. In the example, the user sets the tag hierarchy depth to 3, representing a tag system divided into three vertical levels: Level 1 (core categories), Level 2 (sub-categories), and Level 3 (specific tag items). The tag category name serves as the naming identifier for each level of tags. Users enter a unique name for each level node through the hierarchical editing tree. The first-level tag can be configured with three major categories: customer, transaction, and asset. The second-level tag further subdivides customer qualifications and customer behavior under the customer category, transaction frequency and transaction size under the transaction category, and asset type and asset holdings under the asset category. The third-level tag further subdivides credit rating and asset size under customer qualifications. Each level of name must match the business characteristics of the financial data object to ensure that the tag category can accurately represent the core attributes of the financial data.
[0014] The tag attribute fields are specific information fields describing the characteristics of each tag node. Users configure exclusive attribute fields for each third-level specific tag item. In the example, the credit rating tag is configured with fields for customer credit score, number of overdue payments, and credit limit; the transaction frequency tag is configured with fields for monthly transaction count, weekly transaction count, and annual total transaction count; and the asset size tag is configured with fields for total asset amount, disposable asset amount, and financial asset amount. The data type is the numeric type corresponding to each attribute field. Users select each attribute field through a drop-down selection box. The available types include numeric, string, boolean, and date. Numeric types are suitable for quantifiable data such as credit scores and asset amounts; string types are suitable for textual data such as asset type and customer industry; boolean types are suitable for binary data such as whether overdue payments have been made or whether premium services have been activated; and date types are suitable for time-based data such as first transaction time and asset holding start time. Each attribute field can only match one data type to ensure the uniqueness of data representation.
[0015] The interface performs real-time preliminary validation during user input, checking the format validity of the parameters, such as whether the tag hierarchy depth is a positive integer and whether the data type matches the field name. Incorrectly formatted parameters are immediately flagged with pop-up prompts, and the user can only complete the input after correction. Once the user has configured all tag structure parameters and confirmed submission, the system structurally integrates all entered parameters according to hierarchical relationships and field associations. Each parameter is labeled with its associated tag node, parameter type, and input identifier, generating a raw tag structure parameter set indexed by hierarchy and defined by field. This parameter set is the raw data collection without logical validation and forms the basis for subsequent standardization processing. Its storage format is consistent with the hierarchical structure of the visual configuration interface, preserving complete parameter configuration traceability information.
[0016] The original tag structure parameter set is subjected to conflict detection and format standardization, the logical consistency between tag levels is checked and the naming rules of attribute fields are unified, and a standardized tag structure definition is generated. The core of this step is to identify logical contradictions and standardize the format of the original tag structure parameter set, eliminate invalid information and logical conflicts in the parameter set, establish a tag structure standard that conforms to financial data management specifications, and generate a standardized tag structure definition. The specific implementation method is as follows: Conflict detection is divided into two parts: tag hierarchy logical consistency check and attribute field conflict check. It is the core link to ensure the logical rationality of the tag system. The tag hierarchy logical consistency check mainly verifies the affiliation relationship, naming uniqueness, and hierarchy depth matching of tag nodes at each level. The affiliation relationship check verifies whether the business connotation of the child tag node matches that of the parent tag node. For example, it is forbidden to assign the transaction frequency tag to the customer qualification parent node. If an affiliation error occurs, it is marked as a logical conflict. The naming uniqueness check verifies whether there are duplicate names for tag nodes at the same level. For example, two nodes named "asset scale" cannot appear under the same first or second level tag. Nodes with the same name are marked as naming conflicts. The hierarchy depth matching check verifies whether the actual configured tag node hierarchy is consistent with the tag hierarchy depth set by the user. For example, when the depth is set to 3, all tag nodes must be distributed in the first, second, and third levels. Fourth level nodes are prohibited. Hierarchical mismatch is marked as depth conflict. The attribute field conflict check mainly verifies whether there are duplicate name fields, data type conflicts, and reasonable number of fields under the same tag node. Fields with duplicate names are directly marked as field conflicts. Fields configured with multiple data types are marked as type conflicts. Empty fields without actual business meaning are marked as invalid field conflicts. All detected conflicts will generate a conflict report, which will specify the conflict location, conflict type, and correction suggestions. The system supports users to make corrections online based on the report. After the correction is completed, the system will re-check until there are no conflicts.
[0017] The format standardization is based on the general specifications for data management in the financial industry. The core is to unify the naming rules for attribute fields, while also standardizing the format of label category names and hierarchical codes. Attribute fields are named using Upper CamelCase, where the first letter of each core term is capitalized and the remaining letters are lowercase, with no spaces or special characters. In the example, "Customer Credit Score" is standardized as "CustomerCreditScore", "Monthly Transaction Count" as "MonthlyTransactionCount", and "IsOverdue" as "IsOverdue". This consistent naming convention ensures the uniqueness and compatibility of fields within the system. Tag category names use Lower CamelCase, where the first term is lowercase and the first letter of subsequent core terms is capitalized. For example, "Customer Qualification" is standardized as "customerQualification", and "Transaction Scale" as "transactionScale". Each level of tag node is assigned a unique level code, which is a numeric level code plus a node number. In the example, the first-level customer tag code is 1-01, the second-level customer qualification tag code is 2-01, and the third-level credit rating tag code is 3-01. The level code is bound to the tag name, achieving a unique digital identifier for the tag node.
[0018] After completing conflict detection and correction and format standardization, the system reconstructs the processed parameter set according to the logical hierarchy of "first-level label - second-level label - third-level label - attribute field - data type - hierarchical encoding". Standardized metadata descriptions are added to each label node and attribute field, including business meaning, data source, applicable financial data object type, etc. At the same time, the parent-child relationship of each label node and the attribution relationship between fields and labels are clarified, forming a complete, standardized, and logically conflict-free label system architecture standard. Finally, a standardized label structure definition is generated, which is the core structural basis for the labeling of financial data objects and has business rationality and system compatibility.
[0019] Receive user-inputted label rule parameters, including initial label weight values, classification boundary thresholds, and label application condition expressions, and associate them with standardized label structure definitions to generate a label rule configuration table; The core of this step is to collect the operational rule parameters of the tag system and establish a precise association between them and the standardized tag structure definition, thereby binding the tag structure with the operational rules and generating a structured tag rule configuration table. The specific implementation method is as follows: Based on the standardized label structure definition, the system provides a dedicated input interface for rule parameters. The interface is linked with the label structure hierarchy tree, allowing users to locate specific third-level label items to enter rule parameters. This ensures accurate matching between rule parameters and label items. The received rule parameters include three categories: initial label weight values, classification boundary thresholds, and label application condition expressions. These are all core operating parameters for label processing. The initial weight of each tag is an importance coefficient of each tag item in the financial data object tag evaluation, ranging from 0 to 1. Users set the initial value for each third-level tag item according to the needs of the financial business scenario. The sum of the initial weights of all third-level tag items under the same second-level tag is 1. In the example, under the customer qualification second-level tag, the initial weight of the credit rating tag is set to 0.4, the asset size tag is set to 0.3, and the customer industry tag is set to 0.3. The larger the weight value, the higher the importance of the tag item in the qualification evaluation. The classification boundary threshold is the critical value for classifying numerical tag items. Users set it according to the classification standards of financial business indicators. In the example, the classification boundary threshold is set for the asset size tag: total assets ≥ 1 million yuan is high net worth, 500,000 yuan ≤ total assets < 1 million yuan is medium net worth, and total assets < 500,000 yuan is low net worth. The transaction frequency tag is set to a monthly transaction count ≥ 10. The highest frequency is 3 transactions per month, and the lowest is 10 transactions per month. The highest frequency is 3 transactions per month, and the lowest is 3 transactions per month. The threshold setting must be consistent with the actual evaluation standards of financial business. The label application condition expression is the business condition for the label item to take effect. Users use standardized SQL-like syntax to write it. The syntax supports logical operators such as AND, OR, and NOT, and comparison operators such as >, <, ≥, ≤, and =. In the example, the application condition expression for the high-end transaction label is "MonthlyTransactionCount≥5AndSingleTransactionAmount≥10000AndAssetScale=high net worth". This means that only financial data objects that meet the requirements of ≥5 transactions per month, ≥10,000 yuan per transaction, and have a high net worth will have the label item take effect. The system will perform syntax validation on the written expression. If there is a syntax error, it will be prompted immediately and submission will be prohibited.
[0020] Association mapping establishes a one-to-one binding relationship between the entered rule parameters and specific tag items in the standardized tag structure definition. The mapping is based on the hierarchical encoding of tag nodes. The system uses hierarchical encoding to accurately bind the initial value of tag weight, classification boundary threshold, and tag applicable condition expression to the corresponding third-level tag item. At the same time, it binds the weight aggregation rules of all third-level tags under each second-level tag item, and binds the weight integration rules of second-level tags to each first-level tag item, ensuring that rule parameters can be aggregated upwards along the tag hierarchy. For non-numerical tag items that do not require setting classification boundary thresholds, such as string tag items like asset type and customer industry, the system marks their threshold fields as inapplicable. For tag items without special effective conditions, their applicable condition expressions are set to always true, ensuring that all tag items have corresponding rule parameter configurations.
[0021] After completing the input and mapping of all rule parameters, the system will structure the mapping results according to the column dimensions of "hierarchical coding - tag category name - tag attribute field - tag weight initial value - classification boundary threshold - tag applicable condition expression - applicable data type". Each configuration record will be assigned a unique configuration number, and a tag rule configuration table will be generated. This configuration table is the core basis for the operation of the tag matching engine. It realizes the deep integration of rule parameters and tag structure. Each record corresponds one-to-one with the specific node defined in the standardized tag structure, with no mismatch or omission.
[0022] The standardized tag structure definition and tag rule configuration table are integrated and encapsulated to generate an initial tag management framework configuration file containing tag metadata and rule logic.
[0023] The core of this step is to integrate and standardize the standardized tag structure definition and tag rule configuration table into a configuration file that can be directly loaded, parsed, and run by the financial data tag management system, thus building the initial financial data object tag management framework. The specific implementation method is as follows: The integration process uses hierarchical tag encoding as the core index, deeply integrating standardized tag structure definitions with tag rule configuration tables to achieve seamless data exchange between the two types. During integration, the system matches each record in the tag rule configuration table to the corresponding node in the standardized tag structure definition according to its hierarchical encoding. It adds unique rule parameter attributes to each tag node, ensuring that the tag structure definition not only includes metadata information such as hierarchy, fields, and data types, but also integrates rule logic information such as weights, thresholds, and expressions. A bidirectional index is also established for the integrated data, allowing users to query corresponding rule parameters through tag nodes and trace corresponding tag nodes through rule parameters, ensuring data relevance and queryability. The integrated data is divided into modules: a tag metadata module and a rule logic module. The tag metadata module contains all structure-related information such as tag hierarchy depth, hierarchy encoding, tag category name, attribute fields, data types, and business meanings. The rule logic module contains all runtime-related information such as initial tag weight values, classification boundary thresholds, applicable condition expressions, and weight aggregation rules. The two modules achieve data interoperability through hierarchical encoding, forming a complete tag management framework data body.
[0024] The encapsulation process transforms the integrated tag management framework data into a standardized configuration file format recognizable by the financial data tag management system. The encapsulation process adheres to the principles of lightweight, parsable, and scalable design. The configuration file content uses a structured hierarchical description, preserving the vertical hierarchical relationships and horizontal field associations of the tag system. Complete file header information is added, including the configuration file version number, generation time, user configuration identifier, applicable financial business scenarios, and data object type. The version number is set in the format "major version number.minor version number," with the initial version number being 1.0 in the example. The generation time is the system time after configuration completion. The user configuration identifier is bound to the user's account to ensure the traceability of the configuration file. The encapsulated configuration file undergoes integrity and validity verification. Verification includes checking the completeness of tag metadata, whether the rule logic matches the metadata, whether the hierarchical encoding is unique, and whether the weight configuration conforms to the summation rules. If verification fails, a verification report is generated, indicating the location of the problem and requiring re-integration. Final encapsulation is completed after successful verification.
[0025] The final generated initial tag management framework configuration file is the digital carrier of the entire financial data object tag management framework. It contains all the metadata and rule logic of the tag system and can be directly loaded by the financial data tag management system. By parsing the configuration file, the system can quickly build a runnable initial tag management framework, providing a complete structural basis and operating rules for subsequent financial data object tagging processing. At the same time, the configuration file supports offline storage and online modification, providing a scalable foundation for subsequent tag system optimization.
[0026] S202, Based on the initial financial data object tag management framework, perform tagging processing on the financial data objects to be processed to generate a set of financial data objects with initial tags; Specifically, it may include: S2021, extracting financial data objects to be processed from the financial data warehouse, including transaction records, customer profiles, and asset holding records, and generating a raw data object set; The core of this step is to rely on the layered architecture of the financial data warehouse to accurately extract various types of financial data objects according to business needs. After formatting and labeling, a structured set of raw data objects is formed, providing a complete raw data foundation for subsequent tagging processing. The specific implementation method is as follows: The financial data warehouse serves as a unified data storage carrier for financial institutions. It adopts a thematic layered design. This data extraction is based on the detailed layer data of the data warehouse. The extraction method supports full extraction and incremental extraction. The extraction strategy is determined by the business tagging requirements. Full extraction is suitable for the initial tagging process, while incremental extraction is suitable for daily update scenarios. The time granularity of incremental extraction can be set to days to ensure the timeliness of the data. The extracted financial data to be processed focuses on three core types: transaction records, customer profiles, and asset holding records. These three types of data are the foundational data for financial business, covering three core dimensions: customers, transactions, and assets. Transaction records contain core information such as unique customer identifiers, transaction time, transaction amount, transaction type, transaction channel, single transaction fee, and counterparty information. Transaction types are further subdivided into transfers, wealth management purchases, loan repayments, and consumer payments. Customer profiles contain core information such as unique customer identifiers, basic identity information, customer industry attributes, credit score, number of overdue payments, credit limit, account opening time, and customer level. The unique customer identifier is a 64-bit character code and is the core linking field across all three types of data. Asset holding records contain core information such as unique customer identifiers, type of held products, amount held, start time of holding, expected product return, changes in market value of held products, and asset liquidity level. Types of held products are further subdivided into demand deposits, time deposits, wealth management products, funds, and stocks.
[0027] During data extraction, precise extraction and filtering conditions must be set. These conditions can be defined according to business scenarios, data time ranges, and customer group types. In this example, for the tagging of retail financial customers, the extraction conditions are set as retail customers with accounts opened for more than 3 months and with transaction records in the past year. The extraction process includes their transaction history for the past 3 years, complete customer profiles, and current valid asset holding records, filtering out invalid test data, null values, and abnormal transaction data. After extraction, the three types of data are formatted, converting heterogeneous data from different data sources into a unified structured data format. The date and time format is standardized to year-month-day hour:minute:second, the monetary unit is standardized to yuan, and the precision of numerical data is standardized to two decimal places. Simultaneously, using the customer's unique identifier as the core linking field, the three types of data are merged to ensure that each customer's transaction history, customer profile, and asset holding records form a coherent data unit, with no isolated data objects. Finally, all merged data units are indexed by customer unique identifiers to generate a set of original data objects with customers as the basic unit. This dataset retains the complete original information of the three types of financial data objects, and all data units have unique index identifiers, providing structured and related original data for subsequent feature extraction.
[0028] S2022, based on the tag structure definition in the initial tag management framework, perform feature extraction on the original data object set, filter key fields related to tags and calculate derived indicators to generate a feature vector set; The core of this step is to use the tag structure definition of the initial tag management framework as the sole basis to filter related fields from the original data object set and calculate derived indicators that fit the tag requirements. After standardization, a feature vector is constructed for each data object, generating a feature vector set, thus realizing the transformation from raw data to tagged feature data. The specific implementation method is as follows: The first step in feature extraction is key field screening. The screening rule strictly matches all attribute fields defined in the initial tag management framework, eliminating redundant fields irrelevant to the tag system and retaining core, relevant fields. The screening process follows the tag hierarchy, matching the attribute fields corresponding to each level of tags one by one. In the example, the customer qualification tag in the tag structure definition corresponds to fields such as credit score, number of overdue payments, and credit limit; the transaction frequency tag corresponds to fields such as monthly and weekly transaction counts; and the asset size tag corresponds to fields such as total holdings and financial asset value. These fields are precisely screened from the original data set, eliminating fields irrelevant to tag matching, such as customer ID number and account opening location, ensuring that the screened fields provide data support for tag assignment. After key field screening, missing values are imputed. For missing values in numerical fields, the mean value of the same customer group is used for imputation; for missing values in categorical fields, the most frequent value is used for imputation, avoiding interference from missing values in subsequent processing.
[0029] Based on the filtered key fields, derived indicators are calculated. These derived indicators are composite indicators that fit the label assessment requirements and cannot be directly obtained from the raw data. The calculation is based on the label assessment logic of financial business, and the calculation process adopts the general calculation method of financial indicators to ensure the business rationality of the indicators. In the example, for the transaction frequency label, the monthly transaction count is calculated as the total number of valid transactions of the statistical data object in the past 30 days, and the quarterly transaction count is the total number of valid transactions in the past 90 days. For the customer contribution label, the comprehensive customer contribution value in the past 6 months is calculated as the transaction fee income + credit interest contribution + wealth management income sharing in the past 6 months. For the asset and liability label, the asset and liability ratio is calculated as the ratio of the customer's total liabilities to total assets. The total assets are the sum of the total holdings in the asset holding record and bank deposits, and the total liabilities are the sum of the customer's outstanding credit and other payable amounts. For the transaction activity label, the average transaction interval is calculated as the arithmetic mean of the time interval between two consecutive valid transactions. The calculation results of the derived indicators have uniform numerical precision. Numerical derived indicators are retained to two decimal places, and proportional derived indicators are converted to percentage form and retained to one decimal place.
[0030] After completing the key field screening and derived index calculation, all feature data are standardized. The purpose of standardization is to eliminate the differences in the units of measurement of different features and to avoid features with excessively large numerical ranges from having an excessive impact on label matching. For numerical features, the min-max normalization method is used to map the feature value to the interval 0-1. The calculation formula is X_norm=(X-X_min) / (X_max-X_min), where X_norm is the standardized feature value, X is the original feature value, X_min is the minimum value of the feature in the original data set, and X_max is the maximum value of the feature. In the example, the credit score interval of 600-900 is mapped to 0-1, with a score of 900 corresponding to 0.9 and a score of 600 corresponding to 0. For categorical features, one-hot encoding is used to convert non-numerical categorical information into computable numerical features. Subsequently, a feature vector is constructed for each data object. The dimensions of the feature vector are consistent with the total number of the filtered key fields and calculated derived indicators. Each dimension of the vector is sorted according to the dimensions of the label system, and the dimension value is the standardized numerical value of the corresponding feature. Each feature vector is bound to a unique customer identifier of the data object, ensuring that each feature vector can be traced back to the corresponding original data object. Finally, the feature vectors of all data objects are integrated into a feature vector set. This set is a structured numerical vector dataset, which serves as the core input data for subsequent label matching.
[0031] S2023, based on the label rule configuration table in the initial label management framework, inputs the feature vector set into the label matching engine, assigns an initial label to each data object through rule calculation, and generates the initial label assignment result; The core of this step is to rely on the label matching engine to load the core parameters of the label rule configuration table, perform rule verification and calculation on a vector-by-vector basis on the feature vector set, assign labels at various levels that fit its features to each data object, and generate traceable initial label assignment results. The specific implementation method is as follows: The tag matching engine is the core execution module for the tagging of financial data objects. Before tag matching, the engine first loads the tag rule configuration table in the initial tag management framework. It loads core parameters such as the initial value of the tag weight, the classification boundary threshold, and the tag application condition expression into the engine's rule calculation module. At the same time, it establishes an association mapping between the parameters and the tags at all levels in the tag structure definition to ensure that the rule parameters can accurately match the corresponding tag items. The engine also performs validity checks on the loaded parameters, checking whether the weight summation conforms to the rules, whether the classification boundary threshold has no overlap, and whether the application condition expression is syntactically compliant. After the checks are passed, the formal tag matching process begins.
[0032] The feature vector set is traversed one by one according to the unique identifier of the data object and input into the label matching engine. For each feature vector, three rule calculations are performed in sequence: applicable condition verification, classification boundary matching, and weighted score calculation, to complete the assignment of labels at all levels. The first step is applicability condition verification. The engine uses the tag applicability condition expressions in the tag rule configuration table to logically judge the corresponding dimension values of the feature vector, checking if the logical and comparison operators in the expressions are true. Only if the applicability condition expression is true will the tag item take effect on the current data object. If the expression is false, the matching of the tag item is skipped. In the example, the applicability condition expression for the "high-end wealth management client" tag is "financial assets ≥ 500,000 and wealth management transactions ≥ 10 in the past 6 months". If the corresponding dimension value in the feature vector of a data object does not meet this condition, the tag item will not take effect. The second step is classification boundary matching. For tag items that pass the applicability condition verification, the engine compares the corresponding dimension values of the feature vector with the classification boundary thresholds in the tag rule configuration table to match the corresponding tag sub-items. In the example, the classification boundary threshold for the "asset size" tag is: total assets ≥ 1,000,000 for high-net-worth clients, and 500,000 ≤ total assets < 10. Customers with a net worth of 00000 are considered medium-net-worth customers, while those with total assets less than 500000 are considered low-net-worth customers. The engine reverses the process by restoring the original value from the standardized total asset value in the feature vector and then matches the corresponding sub-tag. The third step involves weighted score calculation. For second- and first-level tags, the engine calculates the weighted sum of the feature values of the matched third-level tag sub-items based on the initial tag weight values in the tag rule configuration table. This yields the comprehensive score for the second- or first-level tag. The weighted summation formula is S=Σ(W_i×X_i), where S is the comprehensive tag score, W_i is the initial weight value of a third-level tag, and X_i is the standardized value of the corresponding feature for that third-level tag. The sum of all W_i under the same parent tag is 1. In the example, the credit rating weight under the second-level customer qualification tag is 0.4, the asset size weight is 0.3, and the customer industry weight is 0.3. After calculating the comprehensive score of the customer qualification using this formula, the engine matches customer qualification tags such as "high-quality," "ordinary," and "basic" based on preset score classification boundary thresholds.
[0033] After completing rule calculations for all tags, the engine assigns a complete tag system from level three to level one to each data object. Tag assignment follows a hierarchical relationship principle: level three tags are basic sub-tags, level two tags are generated by weighted matching of their subordinate level three tags, and level one tags are generated by comprehensive matching of their subordinate level two tags. Finally, an initial tag assignment result is generated. This result uses the unique identifier of the data object as an index and includes information such as the tag names at each level for each data object, the core feature values of tag matching, the comprehensive tag score, and the basis for tag matching. This ensures that the assignment of each tag is supported by clear rule calculations, possesses complete traceability, and eliminates any tag assignments without a basis.
[0034] S2024 binds the initial label assignment results to the original data objects for storage, constructs a set of financial data objects with initial labels, and outputs a labeled dataset.
[0035] The core of this step is to use the unique identifier of the data object as the core association basis, deeply bind the initial label assignment results with the original data object, construct a structured set of labeled financial data objects after integrity verification, and output a standardized labeled dataset according to business needs, realizing the integrated storage and output of label information and original financial data. The specific implementation method is as follows: The binding of tags to the original data is based solely on the unique identifier of the data object. Deep binding is achieved through field association. Information such as the names of tags at all levels, the overall tag score, and the tag matching criteria from the initial tag allocation results are added as new fields to the corresponding data units in the original data object set. This ensures a one-to-one correspondence between the original financial data fields and the tag information fields for each data unit. All information from the original data object is preserved during the binding process without any deletions. A unique field prefix is added to the tag information fields to distinguish them from the original data fields. In the example, "tag_" is used as the prefix, and fields such as tag_first-level tag, tag_second-level tag, tag_third-level tag, and tag_tag score are set to ensure the uniqueness of the field identifiers. During the binding process, all data units are traversed and validated one by one to check for unique identifier mismatches, missing or incorrectly bound tag information. Any data units with abnormal binding are manually corrected to ensure the accuracy and completeness of the binding.
[0036] The bound data is stored using a columnar database. This storage method significantly improves the query efficiency of tag fields and adapts to the high-frequency data filtering needs by tags in subsequent business applications. The storage structure uses the unique identifier of the data object as the primary key. Data columns are divided into two main categories: raw data columns and tag information columns. The raw data columns contain all core fields of transaction records, customer profiles, and asset holding records. The tag information columns contain information such as tag names at all levels, tag comprehensive scores, feature vector values, and tag matching criteria. Tag indexes and feature indexes are also established for the stored dataset to further improve the efficiency of tag- and feature-based queries. Based on the stored bound data, a collection of financial data objects with initial tags is constructed. This collection uses data objects as the basic unit, and each unit contains an integrated data body containing raw financial data, feature vectors, and complete tag information. All data units in the collection have undergone validity and integrity checks, with no abnormal or isolated data, directly supporting subsequent business applications and tag system optimization.
[0037] Finally, a standardized labeled dataset is output based on the actual needs of financial business. The dataset supports multiple mainstream structured data formats, which can be selected according to the adaptability of the business application system, including Parquet, CSV, and ORC formats. Parquet format is suitable for storing and analyzing large amounts of data, while CSV format is suitable for quick retrieval in small business scenarios. Flexible data filtering and sampling functions are provided during the output process. The financial data object set can be filtered according to business scenarios, label types, customer groups, etc., extracting the required subset datasets for output. In the example, for the label application requirements of credit business, all data units with the primary label "Credit Customer" are filtered to generate a label dataset specifically for credit business; for high-net-worth customer mining business, all data units with the tertiary label "High-Net-Worth Customer" are filtered to generate a label dataset specifically for high-net-worth customers. The output labeled dataset comes with complete metadata descriptions, including the dataset's generation time, data volume, time range covered, label system description, and feature indicator explanation, ensuring that business application systems can quickly understand and use the dataset.
[0038] S203, Collect business feedback data generated by the financial data object with initial label in the corresponding business application, and construct a reinforcement learning state space for optimizing the label system; Specifically, this may include: S2031, deploying data tracking interfaces in business application systems to collect user behavior and processing results of financial data objects with initial tags in business scenarios in real time, and generating raw feedback data streams; The core of this step is to deploy a lightweight data tracking interface at the core business nodes of the financial business application system. This enables real-time collection of feedback data across the entire business chain of financial data objects with initial tags, integrating discrete business behaviors and results into a continuous stream of raw feedback data. This provides a solid business feedback basis for subsequent tagging system optimization. The specific implementation method is as follows: The data collection interface is a non-intrusive collection interface adapted to financial business systems. It is deployed at key business nodes of core business application systems such as credit approval, customer precision marketing, financial risk monitoring, and asset allocation recommendation, including business touchpoint nodes, user operation nodes, business processing nodes, and result feedback nodes. The deployment process adopts an interface docking method without modifying the original code of the business system, thus avoiding affecting the normal operation of the business system. The collection frequency of the interface can be set according to the needs of the business scenario. The collection frequency for high-frequency trading and real-time risk control scenarios is set to millisecond level, while that for ordinary customer marketing scenarios is set to second level, ensuring the real-time nature of the collected data. The data collection objects of the tracking interface are two types of core data generated by financial data objects with initial tags in business scenarios. The first is user behavior data, which is the operation behavior of financial business participants on the tagged data objects. The focus of behavior data varies in different business scenarios. In the credit approval scenario, the data collection is the tag viewing, data verification, and approval decision-making operations of the approval personnel. In the customer marketing scenario, the data collection is the tag push out to customers, page browsing, product clicks, and application submission operations. In the risk monitoring scenario, the data collection is the tag warning viewing, risk verification, and handling operations of the risk control personnel. The second is business processing result data, which is the final result after the tagged data objects have been processed by the business process. In the credit approval scenario, the data collection includes results such as approval, approval rejection, and credit limit adjustment. In the customer marketing scenario, the data collection includes results such as successful conversion, failed conversion, and temporary non-conversion. In the risk monitoring scenario, the data collection includes results such as accurate warnings, false alarms, and missed risk warnings.
[0039] During the data collection process, the tracking interface binds a unique collection identifier to each piece of collected data and associates it with core traceability information, including the unique identifier of the tagged financial data object, initial tag information (tag level, tag name, tag score), business scenario type code, business processing node, collection timestamp, and operator identifier (if any). The collection timestamp uses a millisecond-level time format to ensure the timeliness of the data. The business scenario type code assigns a unique numerical code to different business scenarios, such as 01 for credit approval, 02 for customer marketing, and 03 for risk monitoring, realizing the digital identification of business scenarios. The interface performs real-time structured encapsulation of the collected user behavior and business processing result data, converting unstructured operation logs into structured data in key-value pair format, unifying the naming rules and data types of data fields. Subsequently, this structured data is continuously output in the order of collection timestamps, forming a continuous raw feedback data stream with complete traceability information. This data stream retains all feedback information of the initially tagged financial data object in the business scenario, with no data omissions or tampering, and serves as the original foundation for subsequent feedback data processing.
[0040] S2032 cleans, denoises, and aligns the original feedback data stream according to time sequence, and statistically analyzes key indicators including tag hit rate, business conversion rate, and risk false alarm rate according to business dimensions to generate a set of feedback feature indicators. The core of this step is to preprocess the raw feedback data stream and perform statistical analysis on business-dimensional indicators. This eliminates invalid information in the data, unifies the time-series benchmark of the data, transforms discrete feedback data into quantitative indicators that can characterize the effectiveness of labeled business operations, and generates a structured set of feedback feature indicators. The specific implementation method is as follows: First, the raw feedback data stream undergoes cleaning and noise reduction, a process comprised of three steps: deduplication, outlier removal, and missing data marking, ensuring data validity and rationality. Deduplication uses the collection identifier and the unique identifier of the financial data object as a combined primary key, eliminating redundant data with duplicate primary keys to avoid duplicate statistics. Outlier removal employs the 3σ principle, calculating the mean μ and standard deviation σ for numerical fields in the data stream. Values exceeding the range of μ-3σ to μ+3σ are identified as outliers. Simultaneously, data that clearly does not conform to business logic is removed based on financial business rules, such as negative conversion rates in customer marketing scenarios or approval times exceeding reasonable limits in credit approval scenarios. All removed outliers are recorded in an outlier log for easy tracing and verification. Missing data marking addresses missing core traceability information and business data fields in the data stream. Missing fields are uniformly marked with missing identifiers, preventing direct removal of missing data and preserving valid portions to ensure data integrity.
[0041] After cleaning and denoising, time-series alignment is performed. The core of time-series alignment is to unify feedback data from different collection frequencies and business nodes onto the same time base, eliminating the impact of time-series deviations on statistical indicators. First, a unified time granularity is set, which is selected according to the characteristics of the business scenario. For high-frequency risk control scenarios, it is set to 1 hour, and for customer marketing and credit approval scenarios, it is set to 1 day. Time windows are divided using this time granularity. Then, the cleaned feedback data is mapped to the corresponding time window according to the collection timestamp. For business behaviors and results data that cross time windows, they are assigned to the corresponding time window according to the business completion time. At the same time, business scenario type codes and tag information are bound to each time window to achieve unified alignment of feedback data from different business scenarios and different tags in the time dimension.
[0042] Based on time-series aligned feedback data, key metrics are statistically calculated according to business dimensions. These dimensions include business scenario type, tag level, and tag name, ensuring that the effectiveness of each tag in different business scenarios can be quantified individually. Core metrics include tag hit rate, business conversion rate, and false positive rate. Additional metrics can be added based on business scenario needs. The calculation of each core metric follows the performance evaluation logic of financial business. The tag hit rate formula is: Tag Hit Rate = Number of valid business transactions that hit the tag in actual business processing / Total number of business processes involving the tag. This metric represents the correlation between the tag and the actual business transaction. The degree of matching with actual business needs; a higher value indicates stronger practicality of the tag. The business conversion rate is calculated as: Business Conversion Rate = Number of business conversions completed after matching the tag / Total number of business outreaches after matching the tag. This metric is mainly applicable to conversion-oriented business scenarios such as customer marketing; a higher value indicates a more significant supporting role of the tag in business conversion. The false positive rate is calculated as: False Positive Rate = Number of times the tag triggered a risk warning but was verified as a false positive / Total number of risk warnings triggered by the tag. This metric is mainly applicable to risk monitoring business scenarios; a lower value indicates higher accuracy in risk identification by the tag. All metric calculation results are uniformly retained to two decimal places. Proportional indicators are converted to percentage form. Indicators without statistical significance (such as business conversion rates in non-conversion scenarios) are marked as not applicable. Finally, the statistical results of all business dimensions are integrated according to the structure of "business scenario code - tag level - tag name - statistical time window - tag hit rate - business conversion rate - risk false alarm rate - extended indicators". Each indicator record is assigned a unique indicator identifier, generating a structured set of feedback feature indicators. This set of indicators realizes the quantitative representation of the tag business effect and is the core data for subsequent tag effectiveness evaluation.
[0043] S2033, perform correlation analysis between the feedback feature index set and the initial label information in the labeled dataset, calculate the effectiveness score of each label in different business scenarios, and generate a label effect evaluation vector; The core of this step is to establish a precise correlation between feedback feature indicators and initial label information. By weighted fusion of multiple indicators, the effectiveness score of the label is calculated, and the score is structured into a label effectiveness evaluation vector according to the label dimension. This enables a comprehensive quantitative evaluation of the business effectiveness of each label. The specific implementation method is as follows: The association analysis uses the core common fields of the feedback feature indicator set and the labeled dataset as the association key to establish a precise bidirectional association between the two datasets. The core association key is the unique identifier of the financial data object, the business scenario code, the label level, and the label name. This association key is used to match the quantitative indicators in the feedback feature indicator set with the initial label information (initial label weight, classification boundary threshold, label application conditions, and label score) in the labeled dataset. Simultaneously, it associates the financial data object feature information corresponding to the associated labels, ensuring that the feedback indicators for each label in different business scenarios can be accurately traced back to the corresponding initial label attributes and data object features. During the association analysis process, consistency checks are performed to verify whether the association keys of the two datasets match, whether the label information is consistent, and whether the statistical range of the indicators matches the scope of business participation for the labels. Data with inconsistent associations is manually checked and corrected to ensure the accuracy of the association results and prevent mismatches or omissions.
[0044] Based on the associated dataset, the effectiveness score of each tag in different business scenarios is calculated. The effectiveness score is a weighted comprehensive score of multiple indicators, aiming to comprehensively represent the actual effect of the tag in the business scenario. Before the score is calculated, differentiated weight coefficients need to be assigned to each statistical indicator according to the core objectives of the business scenario. The value range of the weight coefficient is 0-1. The sum of the weight coefficients of all indicators participating in the calculation in the same business scenario is 1. The weight coefficient is set according to the priority of the business scenario. In the customer marketing scenario, the business conversion rate is the core indicator, with a weight coefficient of 0.5, the tag hit rate is an auxiliary indicator, with a weight coefficient of 0.3, and other extended indicators are assigned the remaining 0.2. In the risk monitoring scenario, the risk false alarm rate is the core indicator. To eliminate its reverse impact, a negative weight is adopted, with a weight coefficient of -0.6, and the tag hit rate is assigned a weight coefficient of 0.4. In the credit approval scenario, the tag hit rate is the core indicator, with a weight coefficient of 0.7, and other extended indicators are assigned a weight coefficient of 0.3. The general formula for calculating the effectiveness score is S=Σ(W_i×X_i), where S is the effectiveness score of the tag in a certain business scenario, W_i is the weight coefficient of the i-th indicator, and X_i is the normalized value of the i-th indicator. Indicator normalization uses the minimum-maximum normalization method, mapping indicator values to the range of 0-1 to eliminate dimensional differences between different indicators. For inverse indicators such as the false positive rate, an inverse 1-X transformation is performed after normalization to ensure a positive correlation between the indicator value and the effectiveness score. The score calculation result is retained to three decimal places, with a score range of 0-1. A larger value indicates higher tag effectiveness in the business scenario, while a score closer to 0 indicates lower tag effectiveness.
[0045] After calculating the effectiveness scores of all tags in different business scenarios, a tag effectiveness evaluation vector is constructed with the tags as the core dimension. First, all tags within the evaluation scope are uniquely encoded, and a fixed vector dimension is assigned to each tag. The total number of dimensions in the vector is consistent with the total number of tags being evaluated. Then, the effectiveness scores of each tag in each business scenario are weighted and fused according to the priority of the business scenario to obtain the comprehensive effectiveness score of each tag. This score is assigned to the corresponding vector dimension of the tag to form the tag effectiveness evaluation vector. Each dimension value in the vector is the comprehensive effectiveness score of the corresponding tag, and the dimension of the tag with no score is assigned a value of 0. Taking the three major business scenarios of credit approval, customer marketing, and risk monitoring as examples, if the total number of tags being evaluated is 20, then the tag effectiveness evaluation vector is a 20-dimensional vector. Each dimension corresponds to the comprehensive effectiveness score of a tag in turn. For example, the 3rd dimension corresponds to the score of 0.892 for the "high-net-worth customer" tag, and the 7th dimension corresponds to the score of 0.756 for the "high-frequency trading customer" tag. This vector realizes the structured and quantitative representation of the effectiveness of all tags and is the core input for constructing the reinforcement learning state space.
[0046] S2034, normalizes the label effect evaluation vector, constructs a reinforcement learning state space containing label state, business feedback and context features, and outputs the reinforcement learning state space representation.
[0047] The core of this step is to standardize the label performance evaluation vector and integrate three core types of information: label state, business feedback, and contextual features, to construct a high-dimensional reinforcement learning state space. This transforms the operational state of the label system and business feedback into numerical representations that the reinforcement learning algorithm can recognize, providing a state basis for subsequent label rule optimization. The specific implementation method is as follows: First, the label performance evaluation vector undergoes normalization encoding. Z-score normalization is used to eliminate the influence of numerical fluctuations in scores across different dimensions, ensuring the vector values conform to a standard normal distribution. The formula is Z = (S - S_mean) / S_std, where Z is the normalized dimension value, S is the original comprehensive effectiveness score of a certain dimension in the label performance evaluation vector, S_mean is the mean of the original scores across all dimensions of the vector, and S_std is the standard deviation of the original scores across all dimensions of the vector. The normalized dimension values are retained to three decimal places, primarily ranging from -3 to 3, facilitating computation and convergence of reinforcement learning algorithms. The encoding process converts the non-numerical state information of the labels into computable numerical codes, mainly including label hierarchy information and applicable business scenario information. One-hot encoding is used to convert the label hierarchy (level 1, level 2, level 3) and applicable business scenario (credit approval, customer marketing, risk monitoring) category information into binary numerical codes. The encoded values are then concatenated with the normalized label performance evaluation vector dimensions to form the basic feature vector.
[0048] Based on the basic feature vector, a reinforcement learning state space is constructed by integrating three types of core features: label state features, business feedback features, and context features. These three types of features together constitute the complete dimension of the state space, comprehensively representing the operating state of the label system and the business application environment. The label state features are the operating parameters of the label system itself, including the initial values of label weights, classification boundary thresholds, and numerical results of label applicability conditions. The initial values of label weights and classification boundary thresholds are directly normalized to 0-1 and then added to the state space dimension. The syntactic complexity of the label applicability condition expression is converted into a numerical indicator and then added to the dimension. The business feedback features are the core quantitative indicators in the feedback feature indicator set, including the normalized values of label hit rate, business conversion rate, and risk false alarm rate. They are added to the state space according to the business scenario dimension to ensure that the feedback of different business scenarios can be represented. The context features are the business environment features of the label system operation, including business scenario type, time context, and customer group features. The business scenario type is a numerical value after one-hot encoding. The time context is the time feature of the statistical time window (such as monthly, quarterly, annual) and is numerically encoded. The customer group features are the group attributes of the tagged data objects (such as retail customers, corporate customers, high-net-worth customers) and are numerically encoded after one-hot encoding. All context features are converted into numerical form and then added to the state space dimension.
[0049] The reinforcement learning state space is a high-dimensional numerical vector space. The number of dimensions in the space is the sum of the normalized dimensions of the label performance evaluation vector, the dimensions of the label state features, the dimensions of the business feedback features, and the dimensions of the context features. Each point in the space is a multi-dimensional state vector, uniquely representing the operational state of the label system, the business feedback effect, and the characteristics of the business environment at a given moment. Each dimension value of the state vector is a normalized or encoded numerical value, uniformly ranging from 0 to 1 or -3 to 3, ensuring that the reinforcement learning algorithm learns each dimension of features equally. Finally, the constructed reinforcement learning state space is standardized by dividing and labeling the multi-dimensional state vectors according to feature categories. Each dimension is labeled with a feature name, feature type, and numerical range. Simultaneously, the state vectors are stored in a tensor form recognizable by the reinforcement learning algorithm, outputting the reinforcement learning state space representation. This representation can be directly input into the reinforcement learning surrogate model, providing a complete state basis for subsequent adaptive optimization of label weights and classification boundaries.
[0050] S204. Using a reinforcement learning algorithm, the label weights and classification boundaries are adaptively adjusted and optimized based on the reinforcement learning state space to generate an optimized label rule set.
[0051] Specifically, a reinforcement learning agent model can be initialized, the action space can be set to the label weight adjustment value and the classification boundary offset, the reward function can be defined as the comprehensive business benefit score, and a reinforcement learning training configuration can be generated. The core of this step is to complete the basic architecture construction and core training parameter configuration of the reinforcement learning agent model, clarify the action execution dimensions and reward determination criteria of the model, and form a standardized reinforcement learning training configuration that can be directly used for training. This lays the algorithmic model foundation for the adaptive optimization of label rules. The specific implementation method is as follows: The reinforcement learning agent model is based on a deep neural network architecture. The number of network layers is adapted to the dimensionality of the reinforcement learning state space. The number of neurons in the input layer is exactly the same as the dimensionality of the state space, ensuring that the feature vectors of the state space can be fully received. Two to three fully connected hidden layers are used, with the number of neurons decreasing progressively from 2 to 4 times the dimension of the input layer to achieve deep feature extraction. The number of neurons in the output layer matches the dimensionality of the action space, used to output the numerical results of action decisions. The network parameters of the model are randomly initialized using a standard normal distribution, with a mean of 0, a standard deviation of 0.01, and a bias term initialized to 0, ensuring the initial stability of the model training. A gradient pruning mechanism is also configured for the model, with a pruning threshold of 1.0, to avoid gradient explosion during training and ensure the smoothness of parameter updates.
[0052] The action space is set up around the core dimension of label rule optimization, defining only two action dimensions: label weight adjustment value and classification boundary offset. This ensures the action space's relevance and simplicity, avoiding invalid actions that could interfere with the optimization results. The label weight adjustment value is a fine-tuning of the initial label weight value, with a range of [-0.1, 0.1]. This range ensures flexibility while preventing sudden weight changes that could cause the label system to fail. The adjustment value retains two decimal places, and it is stipulated that the sum of all adjusted label weights at the same level remains 1. If a single label weight exceeds a reasonable range after adjustment, the model will automatically fine-tune other label weights at the same level proportionally to meet the constraints. The classification boundary offset is the offset value from the original classification boundary threshold, with a range set at 10% of the original threshold. For example, if the original asset size threshold is 1 million yuan, the offset range is [-100,000 yuan, 100,000 yuan]. The offset precision for numerical thresholds is consistent with the original threshold, while no offset is set for categorical thresholds. The offset is represented by ΔT, with positive values indicating an upward adjustment and negative values indicating a downward adjustment. Each dimension of the action space is a continuous numerical value, and the action result output by the model is a combination of specific weight adjustment values and classification boundary offsets.
[0053] The core metric for the reward function is defined as the comprehensive business benefit score. This score is a comprehensive quantitative value that integrates multiple dimensions of financial business indicators, directly representing the actual benefits of adjusting the label rules in business applications. The calculation formula for the reward function is R = W_1 × H + W_2 × C - W_3 × F + W_4 × P, where R is the comprehensive business benefit score, with a value range of [0,1]. The larger the value, the better the business benefit after the action is executed; W_1, W_2, W_3, and W_4 are the weight coefficients of each business indicator, and W_1 + W_2 + W_3 + W_4 =1. The weighting coefficients are set differently based on the core objectives of different business scenarios. In the customer marketing scenario, W_1=0.2, W_2=0.5, W_3=0.2, W_4=0.1; in the risk monitoring scenario, W_1=0.3, W_2=0, W_3=0.5, W_4=0.2. H is the normalized value of tag hit rate, C is the normalized value of business conversion rate, F is the normalized value of risk false alarm rate, and P is the normalized value of business processing cost reduction rate. The risk false alarm rate is given a negative weight because it is an inverse indicator; the higher the value, the worse the business performance. All business indicators have been normalized to 0-1 to eliminate the impact of dimensional differences on the score.
[0054] After defining the model architecture, action space, and reward function, a reinforcement learning training configuration is generated. This configuration includes core training hyperparameters: the learning rate is set to 0.001 to control the update step size of the model parameters; the discount factor γ is set to 0.95, which is the decay coefficient for future rewards, representing the model's emphasis on long-term business benefits; the exploration rate ε is initially set to 0.9, which decays linearly to 0.1 with the number of iterations, used to balance the model's exploratory and exploitative aspects, increasing random exploration in the early stages of training to find better actions and reducing exploration in the later stages to utilize the learned optimal strategy; the maximum number of iterations is set to 1000 rounds, and the batch training size is set to 32. All parameters are clearly labeled with their meanings and the basis for their values, forming a complete and standardized reinforcement learning training configuration.
[0055] Input the current reinforcement learning state space into the reinforcement learning agent, calculate the optimal action through a deep Q-network or policy gradient algorithm, and output the label weight adjustment and classification boundary offset. The core of this step is to input the reinforcement learning state space, which represents the operational state of the label system, into the agent model, perform action decision calculations through a deep Q-network or policy gradient algorithm, select the optimal action that maximizes the overall business benefits, and output the specific label weight adjustment amount and classification boundary offset. The specific implementation method is as follows: Before input, the reinforcement learning state space representation is converted into a tensor form that the model can recognize. The tensor dimension is exactly matched with the number of neurons in the input layer of the surrogate model. The data type is set to floating point and the precision is retained to three decimal places. At the same time, the tensor is normalized and verified to ensure that the values of all dimensions are within the reasonable range of model training. After the verification is passed, the tensor is input into the reinforcement learning surrogate model in batches. If it fails, the state space is normalized again.
[0056] The model supports two algorithms: Deep Q-Network (DQN) and Policy Gradient. The choice depends on the specific business scenario: DQN is used for non-real-time scenarios such as customer marketing and asset allocation, while Policy Gradient is used for real-time scenarios such as risk monitoring and credit approval. The core of the DQN algorithm is to determine the optimal action by calculating the Q-value of each action. The Q-value represents the expected cumulative reward for performing an action in a given state. The model uses the feature tensor of the current state and a deep neural network to calculate the Q-values of all possible actions, selecting the action with the highest Q-value as the optimal action. This algorithm also introduces a target network and an experience replay mechanism. The target network serves as a benchmark for fixing the Q-value, and its parameter synchronization frequency with the main network is set to once every 10 iterations to avoid Q-value estimation bias. The core of the Policy Gradient algorithm is to output the probability distribution of actions through a policy network. The policy network is a probabilistic model based on a deep neural network. After inputting the state tensor, it outputs the execution probability of each action in the action space. The model samples and calculates the expected reward of different actions, selecting the action with the highest expected reward as the optimal action. This algorithm is an online learning algorithm, requiring no storage of experience samples, making it suitable for business scenarios with high real-time requirements.
[0057] Taking a customer marketing scenario as an example, when using the deep Q-network algorithm, after inputting the tensor representing the current state of the tag system into the model, the model calculates multiple sets of weight adjustment values and classification boundary offsets. Among them, the action combination of adjusting the weight of the "high-net-worth customer" tag by +0.06 and raising the classification boundary threshold from 1 million yuan to 1.08 million yuan has the largest Q value, and this combination is the optimal action. After the model outputs the specific value of the optimal action, it performs a validity check. The check includes whether the tag weight adjustment is non-negative, whether the sum of the weights of tags at the same level is 1, whether the thresholds after the classification boundary offset do not overlap, and whether the thresholds after the offset conform to the business logic. If the check fails, the model will recalculate the action corresponding to the second-best Q value until a valid action result is output. If the check passes, the final tag weight adjustment amount and classification boundary offset are associated with the tag's unique code to generate a structured action output result, which clearly defines the adjustment value corresponding to each tag and the offset value corresponding to each classification boundary.
[0058] Update the label weights and classification boundaries in the label rule configuration table based on the output adjustment amount, relabel some data objects and evaluate the reward value under the new state to generate experience samples; The core of this step is to implement the optimal action results output by the model into the label rule configuration table, verify the adjustment effect by relabeling some data objects, calculate the comprehensive business benefit score under the new state, and generate a complete experience sample containing state, action, reward, and new state to provide data support for subsequent model training. The specific implementation method is as follows: First, the label rule configuration table is updated. Based on the structured action results output by the model, the initial values of the corresponding label weights in the configuration table are precisely modified according to the unique label code, and the corresponding classification boundary thresholds are modified according to the unique identifier of the classification boundary. During the modification process, the original parameter records are retained for easy rollback later. After the modification is completed, a full rule verification is performed on the configuration table. The verification includes weight constraints, threshold non-overlap constraints, and the matching of applicable condition expressions with new parameters. If the verification passes, the updated label rule configuration table is saved. If it fails, the original parameters are restored and the model's action calculation is retried.
[0059] To balance validation efficiency and result accuracy, a stratified random sampling method was used to select a portion of data objects from the labeled dataset for relabeling. The sampling ratio was set at 10%, and the sampling was stratified according to label type, business scenario, and customer group to ensure sample representativeness and avoid distortion of the effect evaluation due to sampling bias. The extracted sample data was input into the label matching engine, which loaded the updated label rule configuration table and reassigned labels to the sample data objects according to the new weights and classification boundaries, generating a new labeled sample dataset. The relabeling process strictly followed the core logic of label matching to ensure the accuracy of the label assignment results.
[0060] The new labeled sample dataset is applied to the corresponding financial business application scenario. Feedback data generated by this sample set in the business scenario is collected. According to the calculation formula of the reward function in step one, the normalized values of each business indicator are calculated and weighted summed to obtain the comprehensive business benefit score in the new state. This score is the actual reward value R_new obtained after the execution of this action. At the same time, based on the new labeling rules and business feedback data, the reinforcement learning state space is reconstructed to represent the new state S_new after the labeling rules are adjusted.
[0061] Finally, reinforcement learning experience samples are generated. These experience samples adopt the classic four-tuple format (S, A, R_new, S_new), where S is the original reinforcement learning state space before the action is executed, A is the optimal action output by the model (a combination of label weight adjustment and classification boundary offset), R_new is the actual reward value in the new state, and S_new is the new reinforcement learning state space after the action is executed. Each experience sample is associated with a specific business scenario and label type, and is labeled with traceability information such as the sample generation time and sampling batch to ensure the integrity and traceability of the samples. Each experience sample is a structured data unit containing all the core elements of the four-tuple.
[0062] Experience samples are stored in a replay buffer, and reinforcement learning agent parameters are periodically trained and updated. The optimization is iterated until the reward converges, and finally the optimized label rule set is output.
[0063] The core of this step is to achieve efficient storage and utilization of experience samples through a replay buffer, periodically batch train the reinforcement learning agent model, continuously update the model parameters until the reward value reaches a convergent state, extract the final label rule parameters to form an optimized label rule set, and achieve adaptive optimization of label weights and classification boundaries. The specific implementation method is as follows: First, a replay buffer is initialized, employing a first-in, first-out (FIFO) storage strategy with a capacity of 10,000 experience samples. When the buffer reaches its capacity, the oldest sample is automatically removed to make room for new samples. The generated experience samples are written to the replay buffer in chronological order of their insertion. During storage, duplicate samples are deduplicated, removing redundant samples with completely identical quadruplets to ensure sample diversity in the buffer. The core function of the replay buffer is to break the temporal correlation of experience samples, preventing overfitting of parameter updates due to the similarity of consecutive samples during model training, thus improving the stability and generalization ability of model training.
[0064] The model is periodically trained using a replay buffer. The training trigger condition is set when 100 new samples are added to the buffer. During each training session, 32 experience samples are randomly selected from the buffer to form a training batch, avoiding sample bias caused by sequential sampling. These experience samples from the training batch are input into the reinforcement learning surrogate model, and stochastic gradient descent is used to update the model's network parameters. During the update process, the step size is updated according to the learning rate control parameter set in step one, while a gradient pruning mechanism is used to prevent gradient explosion. If a deep Q-network algorithm is used, the parameters of the main network are copied to the target network at a set synchronization frequency to update the target network's evaluation benchmark. If a policy gradient algorithm is used, the parameters of the policy network are directly updated through the gradient of the reward value, causing the model to iterate towards maximizing the expected reward. During training, the exploration rate ε decreases linearly with the number of iterations, gradually decreasing from an initial value of 0.9 to 0.1, gradually reducing random exploration actions and increasing the utilization of the learned optimal policy, ensuring the decision stability of the model in the later stages of training.
[0065] The termination condition for model iterative optimization is that the reward value reaches a convergent state. First, a convergence judgment threshold ΔR=0.001 is set. That is, when the fluctuation range of the reward value in 50 consecutive iterations is less than this threshold and the reward value no longer shows a significant upward trend, it is judged that the reward has converged, indicating that the model has learned the optimal label rule adjustment strategy. If the number of iterations reaches the set upper limit of 1000 rounds and the reward value still has not converged, the hyperparameters such as the learning rate and discount factor in the training configuration are adjusted, the replay buffer is cleared, and training is restarted.
[0066] Once the reward value reaches convergence, all parameters in the updated label rule configuration table are extracted, including optimized label weights and classification boundary thresholds. Unadjusted parameters such as label application condition expressions and label hierarchy structures are retained. The extracted parameters are then standardized in format, unifying precision, naming rules, and storage format. Rule rationality is validated according to the actual needs of financial business, ensuring that the optimized rules conform to business logic and practical application scenarios. Finally, the standardized label rule parameters are integrated into a structured rule set, generating an optimized label rule set. This rule set can directly replace the original initial label rule configuration table and be loaded into the label matching engine for the labeling of financial data objects. This enables adaptive optimization of the financial data object label management system, and the rule set supports continuous iterative optimization based on business feedback.
[0067] Another embodiment of the present invention provides an adaptive optimization system for financial data object tag management, see [link to relevant documentation]. Figure 5 The system may include: The receiving module 501 is used to receive user-defined configurable tag structure parameters and tag rule parameters, and generate an initial financial data object tag management framework based on the configurable tag structure parameters and tag rule parameters. The processing module 502 is used to perform tagging processing on the financial data objects to be processed based on the initial financial data object tagging management framework, and generate a set of financial data objects with initial tags. The construction module 503 is used to collect business feedback data generated by the financial data object with initial labels in the corresponding business application, and to construct a reinforcement learning state space for optimizing the label system. The optimization module 504 is used to adaptively adjust and optimize the label weights and classification boundaries based on the reinforcement learning state space using a reinforcement learning algorithm, thereby generating an optimized label rule set.
[0068] This invention also provides a storage medium storing a computer program, wherein the computer program is configured to execute the steps in any of the above method embodiments when running.
[0069] This invention also provides an electronic device, including a memory and a processor, wherein the memory stores a computer program, and the processor is configured to run the computer program to perform the steps in any of the above method embodiments.
[0070] The above description, based on the embodiments shown in the figures, details the structure, features, and effects of the present invention. The above description is only a preferred embodiment of the present invention, but the present invention is not limited to the scope of implementation shown in the figures. Any changes made in accordance with the concept of the present invention, or equivalent embodiments modified to have equivalent changes, that do not exceed the spirit covered by the specification and figures, should be within the protection scope of the present invention.
Claims
1. An adaptive optimization method for managing labels on financial data objects, characterized in that, The method includes: Receive user-defined configurable tag structure parameters and tag rule parameters, and generate an initial financial data object tag management framework based on the configurable tag structure parameters and tag rule parameters; Based on the initial financial data object tag management framework, the financial data objects to be processed are tagged to generate a set of financial data objects with initial tags. Collect business feedback data generated by the financial data objects with initial labels in corresponding business applications, and construct a reinforcement learning state space for optimizing the label system; Using reinforcement learning algorithms, the label weights and classification boundaries are adaptively adjusted and optimized based on the reinforcement learning state space to generate an optimized label rule set.
2. The method according to claim 1, characterized in that, The process involves receiving user-defined configurable tag structure parameters and tag rule parameters, and generating an initial financial data object tag management framework based on these parameters, including: The system receives user-inputted tag structure parameters through a visual configuration interface, including tag hierarchy depth, tag category name, tag attribute fields and their data types, and generates the original tag structure parameter set. The original tag structure parameter set is subjected to conflict detection and format standardization, the logical consistency between tag levels is checked and the naming rules of attribute fields are unified, and a standardized tag structure definition is generated. Receive user-inputted label rule parameters, including initial label weight values, classification boundary thresholds, and label application condition expressions, and associate them with standardized label structure definitions to generate a label rule configuration table; The standardized tag structure definition and tag rule configuration table are integrated and encapsulated to generate an initial tag management framework configuration file containing tag metadata and rule logic.
3. The method according to claim 2, characterized in that, The initial financial data object tagging management framework performs tagging processing on the financial data objects to be processed, generating a set of financial data objects with initial tags, including: Extract financial data objects to be processed from the financial data warehouse, including transaction records, customer profiles, and asset holding records, to generate a raw data object set; Based on the tag structure definition in the initial tag management framework, feature extraction is performed on the original data object set, key fields related to tags are selected and derived indicators are calculated to generate a feature vector set; Based on the label rule configuration table in the initial label management framework, the feature vector set is input into the label matching engine, and an initial label is assigned to each data object through rule calculation, generating the initial label assignment result; The initial label assignment results are bound and stored with the original data objects to construct a collection of financial data objects with initial labels, and the labeled dataset is output.
4. The method according to claim 3, characterized in that, The process of collecting business feedback data generated by the financial data objects with initial labels in corresponding business applications, and constructing a reinforcement learning state space for optimizing the labeling system, includes: Deploy data tracking interfaces in business application systems to collect user behavior and processing results of financial data objects with initial tags in business scenarios in real time, and generate raw feedback data streams; The raw feedback data stream is cleaned, denoised, and time-series aligned. Key indicators including tag hit rate, business conversion rate, and risk false alarm rate are statistically analyzed according to business dimensions to generate a set of feedback feature indicators. The feedback feature index set is correlated with the initial label information in the labeled dataset. The effectiveness score of each label in different business scenarios is calculated, and a label effect evaluation vector is generated. The label performance evaluation vector is normalized and encoded to construct a reinforcement learning state space that includes label status, business feedback, and contextual features, and the reinforcement learning state space representation is output.
5. The method according to claim 4, characterized in that, The step involves using a reinforcement learning algorithm to adaptively adjust and optimize the label weights and classification boundaries based on the reinforcement learning state space, generating an optimized label rule set, including: Initialize the reinforcement learning agent model, set the action space to the label weight adjustment value and the classification boundary offset, define the reward function as the comprehensive business benefit score, and generate the reinforcement learning training configuration. Input the current reinforcement learning state space into the reinforcement learning agent, calculate the optimal action through a deep Q-network or policy gradient algorithm, and output the label weight adjustment and classification boundary offset. Update the label weights and classification boundaries in the label rule configuration table based on the output adjustment amount, relabel some data objects and evaluate the reward value under the new state to generate experience samples; Experience samples are stored in a replay buffer, and reinforcement learning agent parameters are periodically trained and updated. The optimization is iterated until the reward converges, and finally the optimized label rule set is output.
6. An adaptive optimization system for managing labels of financial data objects, characterized in that, The system includes: The receiving module is used to receive user-defined configurable tag structure parameters and tag rule parameters, and generate an initial financial data object tag management framework based on the configurable tag structure parameters and tag rule parameters. The processing module is used to perform tagging processing on the financial data objects to be processed based on the initial financial data object tag management framework, and generate a set of financial data objects with initial tags. The module is used to collect business feedback data generated by the financial data object with initial labels in the corresponding business applications, and to construct a reinforcement learning state space for optimizing the label system. The optimization module is used to adaptively adjust and optimize the label weights and classification boundaries based on the reinforcement learning state space using a reinforcement learning algorithm, thereby generating an optimized label rule set.
7. The system according to claim 6, characterized in that, The receiving module is specifically used for: The system receives user-inputted tag structure parameters through a visual configuration interface, including tag hierarchy depth, tag category name, tag attribute fields and their data types, and generates the original tag structure parameter set. The original tag structure parameter set is subjected to conflict detection and format standardization, the logical consistency between tag levels is checked and the naming rules of attribute fields are unified, and a standardized tag structure definition is generated. Receive user-inputted label rule parameters, including initial label weight values, classification boundary thresholds, and label application condition expressions, and associate them with standardized label structure definitions to generate a label rule configuration table; The standardized tag structure definition and tag rule configuration table are integrated and encapsulated to generate an initial tag management framework configuration file containing tag metadata and rule logic.
8. The system according to claim 7, characterized in that, The processing module is specifically used for: Extract financial data objects to be processed from the financial data warehouse, including transaction records, customer profiles, and asset holding records, to generate a raw data object set; Based on the tag structure definition in the initial tag management framework, feature extraction is performed on the original data object set, key fields related to tags are selected and derived indicators are calculated to generate a feature vector set; Based on the label rule configuration table in the initial label management framework, the feature vector set is input into the label matching engine, and an initial label is assigned to each data object through rule calculation, generating the initial label assignment result; The initial label assignment results are bound and stored with the original data objects to construct a collection of financial data objects with initial labels, and the labeled dataset is output.
9. A storage medium, characterized in that, The storage medium stores a computer program, wherein the computer program is configured to execute the method of any one of claims 1-5 when it is run.
10. An electronic device comprising a memory and a processor, characterized in that, The memory stores a computer program, and the processor is configured to run the computer program to perform the method of any one of claims 1-5.