Threshold adaptive network transaction risk early warning method and system based on expert feedback
By dynamically adjusting the network transaction risk warning threshold using a reinforcement learning agent based on expert feedback, the problem of insufficient adaptability of existing systems in diverse risk and dynamic environments is solved, achieving accurate multi-level early warning and stable risk identification.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- SHANDONG UNIV
- Filing Date
- 2026-03-16
- Publication Date
- 2026-06-23
AI Technical Summary
Existing online transaction risk warning systems lack dynamic adjustment capabilities, making it difficult to adapt to diverse risks and dynamic market environments. This results in either excessive or insufficient warnings, and makes it difficult to identify different types of risks and respond to fluctuations in risk conditions.
A reinforcement learning agent based on expert feedback is adopted to dynamically adjust the early warning threshold by adaptively adjusting the early warning strategy in combination with the risk type and situational change characteristics of online transactions. The early warning threshold is optimized by training the agent with expert feedback.
It improves the accuracy and stability of the early warning system, reduces the cost of manual rule configuration, enhances the adaptability to diverse risks, and achieves precise identification and dynamic response of fine-grained early warning levels.
Smart Images

Figure CN122264784A_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of online transaction monitoring technology, specifically to a threshold-adaptive online transaction risk early warning method and system based on expert feedback. Background Technology
[0002] The statements in this section are merely background information relating to this disclosure and do not necessarily constitute prior art.
[0003] With the rapid development of the internet economy, online transactions have become an indispensable part of people's daily lives. However, the widespread use of online transactions has also brought many problems, with frequent occurrences of illegal activities such as product adulteration. This not only disrupts market order but also poses a serious threat to the personal and property safety of consumers. Early warning of risks in the online transaction process can identify potential hidden dangers and provide a basis for decision-making by regulatory authorities and platform operators to take targeted measures, thereby effectively reducing the economic losses and social impact that illegal activities may cause.
[0004] Online transaction risk early warning refers to the process of determining whether to issue risk alerts or warnings to regulatory authorities based on the current status and changing trends of online transaction risks. Fine-grained early warning systems, with different levels to reflect the severity and urgency of online transaction risks, can enhance the targeting of warnings while avoiding excessive intervention. For example, a red warning can be triggered when the online transaction risk value is high, indicating a high probability of illegal activities and serious potential harm; a blue warning can be triggered when the risk value is low, indicating a limited impact on market order and consumer safety; yellow and orange warnings represent different stages between these two, with the risk level gradually increasing. Setting multi-level warning systems helps achieve tiered response and refined regulation.
[0005] However, most early warning systems on the market today are static methods with fixed thresholds, which face the following problems and difficulties when dealing with diverse risks and dynamic market environments: (1) Fixed warning thresholds lack dynamic adjustment capabilities. Existing warning systems typically use static thresholds to assess online transaction risks, which are difficult to adapt to the ever-changing transaction environment. On the one hand, setting the threshold too low can easily trigger excessive warnings, thereby interfering with normal transaction activities and adversely affecting economic operation and market order; on the other hand, setting the threshold too high may lead to insufficient warnings, exposing consumers to greater risks, such as threats to personal safety and economic losses. Therefore, there is an urgent need for a technology that can dynamically adjust the warning threshold.
[0006] (2) Online transaction risks are diverse, and the warning thresholds for different types of risks vary. Different types of online transaction risks differ in their occurrence mechanisms, degree of harm, and regulatory focus. A single warning strategy is insufficient to accurately identify and provide reasonable warnings for multiple risk types. Therefore, a warning technology capable of identifying different risk types is needed.
[0007] (3) The risk situation in online transactions is not stable. Online transaction risks are usually affected by a variety of factors such as market environment, user behavior, promotional activities and emergencies, exhibiting obvious volatility and non-stationary characteristics. The same type of risk may show drastically different trends in different time periods. Existing static early warning methods are difficult to fully depict the dynamic changes in online transaction risks, and are prone to false alarms or omissions when the risk situation fluctuates, thus affecting the reliability and practicality of the early warning system. Summary of the Invention
[0008] To address the aforementioned issues, this disclosure proposes a threshold-adaptive online transaction risk early warning method and system based on expert feedback. Targeting diverse online transaction risks and temporal fluctuations, it introduces a reinforcement learning agent to dynamically adjust the early warning threshold, achieving adaptive threshold adjustment. This method utilizes expert feedback on early warning results to train the agent, enabling it to adaptively adjust fine-grained early warning thresholds based on the type and state of online transaction risks.
[0009] According to some embodiments, the present disclosure adopts the following technical solutions: Threshold-adaptive online transaction risk early warning methods based on expert feedback include: Early warning strategy, reward model, agent initialization; Acquire online transaction data and preprocess it; Based on preprocessed online transaction data, calculate the risk value of all or some specified types of online transaction risks, and construct the time-series variation characteristics of online transaction risks; The intelligent agent observes the temporal variation characteristics of network transaction risks and constructs a network transaction risk early warning state. Based on the early warning state, it executes actions, selects strategies, and outputs corresponding early warning strategy threshold optimization actions. The threshold parameters of the early warning strategy are adjusted based on the threshold optimization action of the early warning strategy to generate an updated early warning strategy. Based on the updated early warning strategy, the current online transaction risk value is compared with the early warning thresholds at each level to calculate the corresponding online transaction risk early warning level, and the early warning result is output. Collect expert feedback on the early warning results and train a reward model to guide the agent to continuously learn and adjust adaptive strategies for the early warning threshold.
[0010] According to some embodiments, the present disclosure adopts the following technical solutions: A threshold-adaptive online transaction risk warning system based on expert feedback includes: The initialization module is used for agent initialization. The online transaction big data acquisition module is used to acquire online transaction data and preprocess it. The online transaction risk value calculation module is used to calculate the risk value of all or some specified types of online transaction risks based on preprocessed online transaction data, and to construct the time-series change characteristics of online transaction risks. The reinforcement learning agent module is used to observe the temporal variation characteristics of network transaction risks and construct network transaction risk warning states. Based on the warning states, it executes actions, selects strategies, and outputs corresponding warning strategy threshold optimization actions. The early warning module is used to adjust the threshold parameters of the early warning strategy based on the threshold optimization action of the early warning strategy, and generate an updated early warning strategy. The expert feedback collection module is used to compare the current online transaction risk value with the warning thresholds at each level based on the updated warning strategy, calculate the corresponding online transaction risk warning level, and output the warning result. The reward prediction module is used to collect expert feedback on the early warning results and train the reward model to guide the agent to continuously learn and adjust the adaptive strategy of the early warning threshold.
[0011] According to some embodiments, the present disclosure adopts the following technical solutions: A computer program product includes a computer program that, when executed by a processor, implements the aforementioned threshold-adaptive online transaction risk warning method based on expert feedback.
[0012] According to some embodiments, the present disclosure adopts the following technical solutions: A non-transitory computer-readable storage medium is provided for storing computer instructions, which, when executed by a processor, implement the aforementioned threshold-adaptive network transaction risk early warning method based on expert feedback.
[0013] According to some embodiments, the present disclosure adopts the following technical solutions: An electronic device includes a processor, a memory, and a computer program; wherein the processor is connected to the memory, the computer program is stored in the memory, and when the electronic device is running, the processor executes the computer program stored in the memory to enable the electronic device to implement the threshold-adaptive online transaction risk warning method based on expert feedback.
[0014] Compared with the prior art, the beneficial effects of this disclosure are as follows: This disclosure presents a threshold-adaptive online transaction risk early warning method based on expert feedback. It calculates a risk value reflecting the online transaction risk situation, and constructs an online transaction risk early warning state by combining the online transaction risk categories and the temporal variation characteristics of the risk value. Based on this state, a reinforcement learning agent adaptively adjusts the threshold parameters of the early warning strategy. This disclosure achieves dynamic adjustment of the early warning threshold by introducing a reinforcement learning agent, thus realizing adaptive early warning thresholds. For diverse online transaction risks and temporally fluctuating risk situations, this disclosure effectively improves the accuracy of early warnings and makes the early warning results more consistent with human regulatory experience and actual business needs, while reducing the cost of manual rule configuration.
[0015] The expert feedback-based threshold adaptive network transaction risk early warning method disclosed herein transforms the adjustment process of the early warning threshold into the decision-making process of an intelligent agent by introducing reinforcement learning. This overcomes the limitations of traditional static threshold methods, enabling the early warning strategy to be dynamically adjusted to adapt to different needs, and endowing the network transaction risk early warning strategy with dynamism.
[0016] The threshold-adaptive online transaction risk early warning method based on expert feedback disclosed herein enhances the adaptability of the early warning system to diverse online transaction risks. It can perceive different types of online transaction risks and dynamically adjust the threshold parameters of the early warning strategy, solving the problem that existing single early warning strategies cannot take into account multiple risk types. It realizes an early warning technology that can identify different types of online transaction risks.
[0017] This disclosure presents a threshold-adaptive online transaction risk early warning method based on expert feedback, which effectively addresses the problem of early warning failure caused by fluctuations in online transaction risk values. By calculating risk values to characterize the risk situation and incorporating time-series risk values into the early warning strategy adjustment process, this disclosure overcomes the shortcomings of traditional fixed threshold methods in adapting to dynamic environments, and improves the stability and accuracy of the early warning system in complex market environments.
[0018] This disclosure presents a threshold-adaptive online transaction risk early warning method based on expert feedback, which reduces the reliance on human experience in the design and maintenance of early warning rules. By introducing an expert feedback-driven reward learning mechanism, this disclosure avoids the manual design of reinforcement learning reward functions, enabling the early warning strategy to continuously learn and optimize during actual operation, thereby improving the system's sustainable operation capability. Attached Figure Description
[0019] The accompanying drawings, which form part of this disclosure, are used to provide a further understanding of this disclosure. The illustrative embodiments of this disclosure and their descriptions are used to explain this disclosure and do not constitute an undue limitation of this disclosure.
[0020] Figure 1This is a schematic diagram of the architecture of the threshold-adaptive online transaction risk warning method based on expert feedback, according to an embodiment of this disclosure. Figure 2 This is a schematic diagram of the threshold-adaptive online transaction risk warning method based on expert feedback, according to an embodiment of the present disclosure. Figure 3 This is a diagram illustrating the architecture of an expert feedback-based threshold-adaptive online transaction risk warning system according to an embodiment of this disclosure. Figure 4 This is a schematic diagram of the structure of a computer device according to an embodiment of the present disclosure. Detailed Implementation
[0021] The present disclosure will be further described below with reference to the accompanying drawings and embodiments.
[0022] It should be noted that the following detailed descriptions are illustrative and intended to provide further explanation of this disclosure. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains.
[0023] It should be noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the exemplary embodiments according to this disclosure. As used herein, the singular form is intended to include the plural form as well, unless the context clearly indicates otherwise. Furthermore, it should be understood that when the terms “comprising” and / or “including” are used in this specification, they indicate the presence of features, steps, operations, devices, components, and / or combinations thereof.
[0024] Example 1 One embodiment of this disclosure provides a threshold-adaptive online transaction risk early warning method based on expert feedback, the method steps of which include: Step 1: Agent initialization; Step 2: Obtain online transaction data and preprocess it; Step 3: Based on the preprocessed online transaction data, calculate the risk value of all or some of the specified types of online transaction risks, and construct the time-series variation characteristics of online transaction risks; Step 4: The agent observes the temporal variation characteristics of network transaction risks and constructs a network transaction risk early warning state. Based on the early warning state and the probability of action exploration, it outputs the corresponding early warning strategy threshold to optimize the action. Step 5: Based on the threshold optimization action of the early warning strategy, adjust the threshold parameters of the early warning strategy to generate an updated early warning strategy; Step Six: Based on the updated early warning strategy, compare the current online transaction risk value with the early warning thresholds at each level, calculate the corresponding online transaction risk early warning level, and output the early warning result; Step 7: Collect expert feedback on the early warning results and train the reward model to guide the agent to continuously learn and adjust the adaptive strategy of the early warning threshold.
[0025] As one embodiment, this disclosure discloses a threshold-adaptive online transaction risk early warning method based on expert feedback. This method calculates risk values reflecting the online transaction risk situation from collected big data on online transactions; constructs an online transaction risk early warning state by combining the online transaction risk categories and the temporal variation characteristics of risk values; adjusts the threshold parameters of the early warning strategy based on this state using a reinforcement learning agent; determines the corresponding early warning level of the online transaction risk according to the updated early warning strategy and the real-time risk value, and outputs the early warning result; collects expert evaluation feedback on the early warning result and trains a reward model to guide the agent to continuously learn how to adjust the early warning strategy according to the risk type and risk situation, thereby achieving early warnings that conform to expert experience and actual regulatory needs in diverse online transaction risks and dynamic market environments. The specific implementation process is as follows: Step 1: Develop an initial online transaction risk warning strategy Reward Model And a reinforcement learning agent for optimizing risk warning thresholds in online transactions; Early warning strategy Composed of a set of threshold parameters, it can be represented as follows:
[0026] in, ; Furthermore, the reinforcement learning agent maintains a Q-table and a set of state spaces. S And a set of action space A that stores the Q-value of each state-action pair, i.e. Q(s,a) ; Furthermore, rewards The parameters are A two-layer neural network.
[0027] Step 2: Obtain online transaction data and preprocess it, including: Step 201: Collect big data on various types of online transaction risks, including contract breaches, product adulteration and quality fraud, food safety, after-sales service, false advertising, misleading promotions, failure to fulfill legal obligations, complaints and reports; the online transaction risk data should at least include risk type identification, occurrence time, frequency of occurrence, and business characteristic information related to the risk; Step 202: After preprocessing the online transaction risk data by cleaning, deduplication, and dividing it into time windows, this embodiment inputs the preprocessed online transaction risk data into a pre-trained Long Short-Term Memory (LSTM) network model to predict the probability of various online transaction risks occurring or accumulating to exceed a threshold within a preset time window. p And store it in the database for risk situation visualization and subsequent calculation; Step 3: Based on the preprocessed online transaction data, calculate the risk value of all or some of the specified types of online transaction risks, and construct the time-series variation characteristics of online transaction risks; Specifically, based on the predicted probability of risk occurrence p In conjunction with the potential harm of corresponding online transaction risks c The risk value of online transactions is calculated. rv This is used to quantitatively characterize the current network transaction risk situation and the subsequent state construction of the intelligent agent. The risk value is calculated according to the following formula:
[0028] in, p This represents the probability of network transaction risk occurring within a preset time window, as predicted by the LSTM prediction model. c This indicates the potential harm or impact weight of the corresponding online transaction risk.
[0029] Step 4: The agent observes the temporal variation characteristics of network transaction risks and constructs a network transaction risk early warning state. Based on the early warning state, it executes actions, selects strategies, and outputs corresponding early warning strategy threshold optimization actions. Specifically, the intelligent agent constructs the current online transaction risk warning status based on the temporal changes in current online transaction risks. The online transaction risk warning status comprehensively describes the category characteristics and changing trends of online transaction risks. It includes the current and historical time series data of the type and risk value of the online transaction risk. For example, the warning status can be represented as (contract default, 0.8, 0.9), where "contract default" is the type of online transaction risk, "0.9" is the risk value at the current moment, and "0.8" is the risk value at the previous moment.
[0030] Furthermore, the online transaction risk warning status ,in, rv 'and rv This refers to the risk value of online transactions at the current moment and in the past. I This refers to the types of risks associated with online transactions.
[0031] Furthermore, based on the warning status, the action to select the strategy and output the corresponding warning strategy threshold optimization action includes: Query whether the current Q-table contains the network transaction risk warning status currently observed by the agent. If it does not exist, the agent will randomly select an action from the action space optimized by the network transaction early warning threshold and execute it. If it exists in the current Q-table to record the current network transaction risk warning status The agent's action selection follows The strategy is represented as follows:
[0032] in, , which represents the action exploration probability, allows the agent to randomly explore the action space for optimizing network trading strategies.
[0033] Step 5: Based on the threshold optimization action of the early warning strategy, adjust the threshold parameters of the early warning strategy to generate an updated early warning strategy; according to the updated early warning strategy, compare the current network transaction risk value with the early warning thresholds at each level to calculate the corresponding network transaction risk early warning level and output the early warning result; collect expert evaluation feedback on the early warning result and train the reward model to guide the agent to continuously learn and adjust the adaptive strategy of the early warning threshold.
[0034] Specifically, collect the corresponding state-action data. Scalar rewards based on expert feedback or reward prediction models Calculate and update the corresponding values in the Q-table. ; Storage alert strategy various threshold parameters The system optimizes actions based on the early warning strategy generated by the intelligent agent, and adjusts the threshold parameters of the early warning strategy in real time to achieve adaptive threshold adjustment. Based on the input network transaction risk value and network transaction type, the system compares them with various thresholds of the early warning strategy to output the early warning level of the network transaction risk. And store it for later reward calculation. .
[0035] Visualize the early warning level records of various types of online transaction risks; query, select, and process the early warning level records; store the early warning levels given by experts based on the online transaction risk early warning records. And based on the recorded warning level Calculate scalar reward value ; In the absence of expert feedback, the scalar reward value is calculated based on the early warning results and the early warning status and early warning strategy observed by the agent, optimizing the actions accordingly. The result is returned to the agent; if expert feedback is available, the predicted value is calculated based on the expert feedback and the predicted reward value. The resulting mean squared error loss is used to update the parameters of the reward model using the backpropagation algorithm.
[0036] First, the warning level l The calculation follows the formula below:
[0037] Furthermore, rewards r This is a scalar used to measure the overall performance of the early warning system on a set of early warning records, calculated based on the correctness of the early warning results. Specifically, the reward value is defined as the proportion of early warning records whose results are consistent with expert evaluations within a given evaluation window, out of the total number of early warning records. Its calculation follows the following rules:
[0038] in, This represents the total number of warning records. This is an indicator function that takes the value 1 when the condition inside the parentheses is true, and 0 otherwise; For early warning strategy The output of the first The warning level of each record; This refers to the warning level assessed by the experts corresponding to this record.
[0039] As one example, the reward model The training follows these methods: Calculate the actual reward value reported by experts. r and Predicted reward value r The mean squared error loss is as follows:
[0040] Furthermore, the reward model is updated using backpropagation technology. .
[0041] Furthermore, the learning process of an intelligent agent follows this procedure: If the current Q-table records the Q-value of the current network transaction risk warning status and its warning strategy optimization actions, that is... Then update the Q-table according to the following formula:
[0042] in, As a discount factor, For learning rate, As a reward value; this public order 8 and order ; If the current network transaction risk warning status and its corresponding warning strategy optimization action are not present in the current Q-table... Then directly record the reward value obtained by the current agent. for .
[0043] in, This represents the probability of risks occurring in online transactions. This represents the coefficient of the degree of impact of online transaction risks on the market. The current online transaction risk warning status observed by the intelligent agent; Optimize actions for the intelligent agent based on the warning strategy threshold selected by the current network transaction risk warning status; State-action pairs recorded by the agent after learning Q-value; Types of risks associated with online transactions; This represents the historical risk value for online transactions. This represents the current risk value for online transactions. For parameters A two-layer neural network; This is a scalar reward value given by experts based on the online transaction risk warning status, threshold optimization actions, and warning effects. for The scalar reward value is based on the network transaction risk warning status, threshold optimization actions, and the predicted warning effect. The warning level calculated for the early warning strategy; This is the warning level given by experts.
[0044] Example 2 One embodiment of this disclosure provides a threshold-adaptive online transaction risk early warning system based on expert feedback, comprising: The initialization module is used for agent initialization. The online transaction big data acquisition module is used to acquire online transaction data and preprocess it. The online transaction risk value calculation module is used to calculate the risk value of all or some specified types of online transaction risks based on preprocessed online transaction data, and to construct the time-series change characteristics of online transaction risks. The reinforcement learning agent module is used to observe the temporal variation characteristics of network transaction risks and construct network transaction risk warning states. Based on the warning states, it executes actions, selects strategies, and outputs corresponding warning strategy threshold optimization actions. The early warning module is used to adjust the threshold parameters of the early warning strategy based on the threshold optimization action of the early warning strategy, and generate an updated early warning strategy. The expert feedback collection module is used to compare the current online transaction risk value with the warning thresholds at each level based on the updated warning strategy, calculate the corresponding online transaction risk warning level, and output the warning result. The reward prediction module is used to collect expert feedback on the early warning results and train the reward model to guide the agent to continuously learn and adjust the adaptive strategy of the early warning threshold.
[0045] Example 3 One embodiment of this disclosure provides a computer program product, including a computer program that, when executed by a processor, implements the aforementioned threshold-adaptive online transaction risk warning method based on expert feedback.
[0046] Example 4 One embodiment of this disclosure provides a non-transitory computer-readable storage medium for storing computer instructions. When these computer instructions are executed by a processor, they implement the threshold-adaptive network transaction risk warning method based on expert feedback.
[0047] Example 5 One embodiment of this disclosure provides an electronic device, including: a processor, a memory, and a computer program; wherein the processor is connected to the memory, the computer program is stored in the memory, and when the electronic device is running, the processor executes the computer program stored in the memory to enable the electronic device to implement the threshold-adaptive online transaction risk warning method based on expert feedback.
[0048] As one embodiment, it can be a computer device, such as Figure 4 As shown, the system includes a display device, an input device, a computer-readable storage medium (volatile memory and non-volatile storage medium), a processor, a communication interface (i.e., a network interface), and a computer program stored on the computer-readable storage medium and executable on the processor. The processor, communication interface, and computer-readable storage medium can be connected via a bus or other means. The communication interface is used to receive and send data, and when the processor executes the program, it implements the steps of the threshold-adaptive online transaction risk warning method based on expert feedback as described in Embodiment 1 above.
[0049] This disclosure is described with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of this disclosure. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided to a processor of a general-purpose computer, special-purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create a machine for implementing the flowchart illustrations and / or block diagrams. Figure 1 One or more processes and / or boxes Figure 1 A device that provides the functions specified in one or more boxes.
[0050] These computer program instructions may also be loaded onto a computer or other programmable data processing equipment to cause a series of operational steps to be performed on the computer or other programmable equipment to produce a computer-implemented process, thereby providing instructions that execute on the computer or other programmable equipment for implementing the process. Figure 1 One or more processes and / or boxes Figure 1 The steps of the function specified in one or more boxes.
[0051] While the specific embodiments of this disclosure have been described above in conjunction with the accompanying drawings, this is not intended to limit the scope of protection of this disclosure. Those skilled in the art should understand that various modifications or variations that can be made by those skilled in the art without creative effort based on the technical solutions of this disclosure are still within the scope of protection of this disclosure.
Claims
1. A threshold-adaptive online transaction risk early warning method based on expert feedback, characterized in that, include: Agent initialization; Acquire online transaction data and preprocess it; Based on preprocessed online transaction data, calculate the risk value of all or some specified types of online transaction risks, and construct the time-series variation characteristics of online transaction risks; The intelligent agent observes the temporal variation characteristics of network transaction risks and constructs a network transaction risk early warning state. Based on the early warning state, it executes actions, selects strategies, and outputs corresponding early warning strategy threshold optimization actions. The threshold parameters of the early warning strategy are adjusted based on the threshold optimization action of the early warning strategy to generate an updated early warning strategy. Based on the updated early warning strategy, the current online transaction risk value is compared with the early warning thresholds at each level to calculate the corresponding online transaction risk early warning level, and the early warning result is output. Collect expert feedback on the early warning results and train a reward model to guide the agent to continuously learn and adjust adaptive strategies for the early warning threshold.
2. The threshold-adaptive online transaction risk early warning method based on expert feedback as described in claim 1, characterized in that, Agent initialization includes: An initial online transaction risk warning strategy, a reward model, and a reinforcement learning agent for optimizing the online transaction risk warning threshold are constructed; the warning strategy consists of a set of threshold parameters, namely... W =( φ, ,ω The agent maintains a Q-table to store the Q-values of various network transaction risk warning states and optimization actions; the reward model has parameters of... θ A two-layer neural network.
3. The threshold-adaptive online transaction risk early warning method based on expert feedback as described in claim 1, characterized in that, The acquisition and preprocessing of network transaction data includes: The system collects various types of online transaction risk data, including contract breaches, product adulteration and quality fraud, food safety, after-sales service, false advertising, misleading promotions, failure to fulfill legal obligations, and complaints and reports. The online transaction risk data includes: risk type identification, occurrence time, occurrence frequency, and business characteristic information related to the risk. The system also performs preprocessing operations such as cleaning, deduplication, and time window segmentation on the online transaction data.
4. The threshold-adaptive online transaction risk early warning method based on expert feedback as described in claim 1, characterized in that, The process of calculating risk values for all or some specified types of online transaction risks based on preprocessed online transaction data, and constructing time-series variation characteristics of online transaction risks, includes: The pre-processed network transaction data is input into a pre-trained long short-term memory network model to predict the probability of various network transaction risks occurring or accumulating beyond a threshold within a preset time window. Based on the predicted probability of risk occurrence and combined with the potential harm of the corresponding online transaction risk, the risk value of online transaction is calculated, which quantitatively represents the current online transaction risk situation.
5. The threshold-adaptive online transaction risk early warning method based on expert feedback as described in claim 1, characterized in that, The intelligent agent observes the temporal variation characteristics of network transaction risks and constructs a network transaction risk early warning state. Based on the early warning state, it executes actions, selects strategies, and outputs corresponding early warning strategy threshold optimization actions, including: The intelligent agent constructs the current online transaction risk warning status based on the time-series changes of the current online transaction risk. The online transaction risk warning status comprehensively describes the category characteristics and changing trends of online transaction risk, including the current time and historical time-series data of the type and risk value of online transaction risk. The agent checks if a corresponding state exists in the Q-table storage based on the current network transaction risk warning status; if it exists, it then... The Greedy strategy optimizes actions in the action space based on the warning strategy; if no action exists, it randomly selects an optimized action with a warning threshold from the action space.
6. The threshold-adaptive online transaction risk early warning method based on expert feedback as described in claim 1, characterized in that, Experts provide feedback on the warning records, assigning and storing the actual warning level for each record; The scalar reward is calculated based on expert feedback to reflect the degree of matching between the early warning level and the actual risk situation, and the record is stored. Based on expert feedback, a reward model is trained to learn the reward values of experts for different combinations of warning states and warning strategies; the agent learns using rewards calculated based on expert feedback or rewards predicted by the reward model.
7. A threshold-adaptive online transaction risk early warning system based on expert feedback, characterized in that: include: The initialization module is used for agent initialization. The online transaction big data acquisition module is used to acquire online transaction data and preprocess it. The online transaction risk value calculation module is used to calculate the risk value of all or some specified types of online transaction risks based on preprocessed online transaction data, and to construct the time-series change characteristics of online transaction risks. The reinforcement learning agent module is used to observe the temporal variation characteristics of network transaction risks and construct network transaction risk warning states. Based on the warning states, it executes actions, selects strategies, and outputs corresponding warning strategy threshold optimization actions. The early warning module is used to compare the current online transaction risk value with the early warning thresholds at each level based on the updated early warning strategy, calculate the corresponding online transaction risk early warning level, and output the early warning result. The expert feedback collection module is used by experts to evaluate the current warning results based on the actual situation. The reward prediction module is used to collect expert feedback on the early warning results and train the reward model to guide the agent to continuously learn and adjust the adaptive strategy of the early warning threshold.
8. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by the processor, it implements the threshold-adaptive online transaction risk warning method based on expert feedback as described in any one of claims 1-6.
9. A non-transitory computer-readable storage medium, characterized in that, The non-transitory computer-readable storage medium is used to store computer instructions, which, when executed by a processor, implement the threshold-adaptive online transaction risk warning method based on expert feedback as described in any one of claims 1-6.
10. An electronic device, characterized in that, include: The device includes a processor, a memory, and a computer program; wherein the processor is connected to the memory, the computer program is stored in the memory, and when the electronic device is running, the processor executes the computer program stored in the memory to enable the electronic device to implement the threshold-adaptive online transaction risk warning method based on expert feedback as described in any one of claims 1-6.