Reinforcement learning based agent triggered offer system

By adopting a reinforcement learning-based agent triggering mechanism, the problem of inconsistent start times for quotation processing in the quotation system is solved, enabling computable trigger control and process traceability, thereby improving the decision-making accuracy and management efficiency of the quotation system.

CN122243604APending Publication Date: 2026-06-19QINGDAO DIAN XIAOYI INFORMATION TECHNOLOGY CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
QINGDAO DIAN XIAOYI INFORMATION TECHNOLOGY CO LTD
Filing Date
2026-03-06
Publication Date
2026-06-19

Smart Images

  • Figure CN122243604A_ABST
    Figure CN122243604A_ABST
Patent Text Reader

Abstract

This invention relates to the field of intelligent decision-making and control technology, and discloses a quotation system based on reinforcement learning and triggered by an intelligent agent. The system includes: an event access module; a state construction module for constructing a decision state vector; a trigger gating module for performing gating decision calculations on the state vector to obtain gating decision results and gating confidence levels; an uncertainty estimation module for estimating the uncertainty of predicted quantities based on the state vector to obtain an uncertainty metric; a quotation agent module for determining quotation actions through a reinforcement learning policy model; an action execution orchestration module for implementing quotation business operations according to the quotation actions; and a result feedback module for receiving operation records and quotation result information. By using the gating decision results and gating confidence levels, combined with the uncertainty metric obtained by the uncertainty estimation module, and based on the gating confidence threshold and uncertainty threshold, the system determines whether to trigger, delay triggering, or not trigger, thereby achieving computable trigger control of the quotation processing timing.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of intelligent decision-making and control technology, specifically to a quotation system triggered by an intelligent agent based on reinforcement learning. Background Technology

[0002] A quotation system is a business system that generates quotation results based on data such as customer information, product or service information, cost and supply information when there is an inquiry, order or transaction request. It is widely used in enterprise sales, supply chain collaboration and platform-based transaction scenarios. A quotation system based on reinforcement learning agent triggering introduces reinforcement learning strategy model and agent decision-making mechanism on the basis of the quotation system. It initiates quotation processing when conditions are met through trigger gating. This enables the system to form standardized event objects, construct state vectors and make decisions around event information, thereby completing the determination and execution of quotation actions. Existing quotation systems usually adopt fixed rules or preset processes to directly enter the quotation processing after receiving an inquiry event, or rely on manual judgment based on experience to determine whether to initiate the quotation process.

[0003] However, in current technology, the timing of the quotation processing usually depends on fixed rule thresholds or human experience judgment. There is a lack of a calculable determination mechanism for triggering decisions, making it difficult to uniformly determine whether to trigger, delay triggering, or not trigger under different inquiry states. This results in inconsistent control of the quotation processing flow and makes it difficult to trace. Summary of the Invention

[0004] To address the shortcomings of existing technologies, this invention provides a quotation system triggered by an intelligent agent based on reinforcement learning, which solves the problems of existing quotation systems relying on fixed rules or human experience for the timing of quotation processing, making it difficult to uniformly determine the triggering, delaying the triggering, or not triggering.

[0005] To achieve the above objectives, the present invention provides the following technical solution: a quotation system triggered by an agent based on reinforcement learning, comprising: The event access module receives event information related to the quotation business and performs standardized processing on the event information to form standardized event objects; The state construction module constructs a state vector for decision-making based on the standardized event object and the business data associated with the standardized event object. The gating module is triggered to perform gating decision calculation on the state vector to obtain the gating decision result and gating confidence. The uncertainty estimation module estimates the uncertainty of the predicted quantity related to the quotation based on the state vector to obtain an uncertainty measure. The pricing agent module determines the pricing action based on the state vector and the uncertainty measure when the gating decision result indicates that pricing processing is triggered; The action execution orchestration module performs the bidding business operation corresponding to the bidding action according to the bidding action, and generates the operation record and bidding result information corresponding to the bidding business operation; The result feedback module receives the operation record and the quotation result information, and stores the quotation result information in association with the state vector, the gating decision result, the gating confidence, the uncertainty measure, and the quotation action.

[0006] Preferably, the event access module includes: Receive event information related to quotation services; The event information is standardized to form a standardized event object, wherein the standardized event object includes event type, event timestamp, event payload and inquiry object identifier; An event identifier is generated based on the standardized event object. The event identifier is used to associate and store the standardized event object with the quotation result information in the result feedback module.

[0007] Preferably, the state construction module includes: Obtain the business data corresponding to the standardized event object; A state vector for decision-making is constructed based on the standardized event objects and the business data. Based on a preset set of fields, the standardized event object and the business data are judged for field integrity, a field missing identifier is generated, and the field missing identifier is incorporated into the state vector.

[0008] Preferably, the trigger gating module includes: Gating decision calculation is performed based on the state vector to obtain the gating decision result and gating confidence, wherein the gating decision result includes triggering, delayed triggering, and no triggering; When the gating decision result is delayed triggering, the standardized event object is written into the delay queue; When the preset callback conditions are met, the standardized event object in the delay queue is resubmitted to the state construction module to update the state vector and allow the trigger gating module to perform gating decision calculation again.

[0009] Preferably, when the trigger gating module performs the gating decision calculation, it determines the gating decision result based on threshold conditions, the threshold conditions including: The gate confidence level is not lower than the gate threshold, wherein the gate threshold is determined by the threshold determination rule based on the state vector; The uncertainty metric is not higher than the uncertainty threshold, wherein the uncertainty threshold is determined by the threshold determination rule based on the state vector.

[0010] Preferably, the uncertainty estimation module includes: Uncertainty estimation is performed on the predicted quantities related to the price based on the state vector; An uncertainty measure is obtained based on the aforementioned uncertainty estimate; The uncertainty metric is provided to the pricing agent module, which then uses the reinforcement learning policy model to determine the pricing action.

[0011] Preferably, the uncertainty estimation module performs uncertainty estimation through quantile regression, including: The state vector is input into the quantile regression model to obtain multiple quantile outputs corresponding to the predicted quantity; The interval parameters of the predicted quantity are determined based on the multiple quantile outputs. The uncertainty measure is determined based on the interval parameters, and the uncertainty measure is provided to the quotation agent module.

[0012] Preferably, the quotation agent module includes: When the gating decision result indicates that the quotation processing is triggered, the state vector and the uncertainty measure are used as inputs to the reinforcement learning policy model; The bidding action is determined by the reinforcement learning strategy model. The bidding action belongs to a preset action set, which includes point price bidding action, range bidding action, tiered bidding action, request for supplementary information action, transfer to manual bidding action, and delayed bidding action.

[0013] Preferably, the action execution orchestration module includes: Upon receiving the quotation action, a quotation processing identifier is generated based on the inquiry object identifier contained in the standardized event object, and the quotation business operation is idempotently controlled according to the quotation processing identifier. When performing the quotation operation, an operation version identifier and a validity period parameter are generated for the quotation operation, and the operation version identifier and the validity period parameter are written into the operation record; The pricing business operation is orchestrated and routed based on the action type of the pricing action, so as to select the corresponding business processing channel from the automatic pricing channel, supplementary information interaction channel, manual processing channel and delayed processing channel to execute the pricing business operation.

[0014] Preferably, the result feedback module includes: Receive the operation record and the quotation result information, and extract the event identifier, the inquiry object identifier, and the operation version identifier as association keys; Based on the association key, the bidding result information is associated and stored with the state vector, the gating decision result, the gating confidence, the uncertainty measure, and the bidding action to form a data sample; The model version identifier of the reinforcement learning policy model is recorded for the data sample, so as to correspond the bidding action with the model version identifier.

[0015] This invention provides a quote-giving system triggered by an intelligent agent based on reinforcement learning. It has the following beneficial effects: 1. This invention outputs the gate control decision result and gate control confidence level by triggering the gate control module, and combines the uncertainty measurement obtained by the uncertainty estimation module. Based on the gate control confidence level threshold and the uncertainty threshold, it determines whether to trigger, delay triggering or not triggering, thereby realizing calculable trigger control of the timing of quotation processing.

[0016] 2. This invention uses quantile regression to perform interval estimation of the predicted quantity related to the price, forms interval parameters, and determines the uncertainty measure accordingly. The uncertainty measure is then provided to the pricing agent module as a strategy input, enabling action decision-making to select the processing method using the uncertainty measure.

[0017] 3. This invention uses an action execution orchestration module to perform idempotent control, version identification, and validity period parameter recording for the bidding business operation. The result feedback module stores and records the model version identifier by associating the bidding result information with the state vector, gating decision result, gating confidence, uncertainty measure, and bidding action based on the association key, thereby realizing traceable management of the processing process and data samples. Attached Figure Description

[0018] Figure 1 This is an architecture diagram of the agent-triggered quotation system based on reinforcement learning according to the present invention. Detailed Implementation

[0019] The technical solution of the present invention will now be clearly and completely described with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of the present invention.

[0020] Please see the appendix Figure 1This invention provides an agent-triggered pricing system based on reinforcement learning, comprising: The event access module receives event information related to the quotation business and performs standardized processing on the event information to form standardized event objects; Furthermore, the event access module includes: Receive event information related to quotation services; Event information is standardized to form standardized event objects, which include event type, event timestamp, event payload, and query object identifier; Event identifiers are generated based on standardized event objects. These event identifiers are used to associate and store standardized event objects with quotation result information in the result feedback module.

[0021] Specifically, the event access module is used to receive event information related to the quotation business and standardize the event information into standardized event objects so that subsequent modules can process it according to a unified data structure. The event access module can receive event information through interface callback or message queue and retain the original event payload and arrival time for traceability. Upon receiving event information, the event access module performs field mapping and format normalization on the event information to form a standardized event object containing event type, event timestamp, event payload, and inquiry object identifier. The event type is used to represent the category to which the event belongs, the event timestamp is used to represent the time when the event occurred, the event payload is used to carry business parameters, and the inquiry object identifier is used to represent the associated inquiry object. For example, when a customer submits an inquiry request on the quotation portal, the event access module recognizes the request as a "new inquiry" event, writes the submission time into the event timestamp, writes the quantity, delivery date, and specification fields into the event payload, and writes the inquiry number into the inquiry object identifier, thereby obtaining a standardized event object. The event access module further generates event identifiers based on standardized event objects, and stores or outputs the event identifiers in correspondence with the standardized event objects for subsequent association. When the result feedback module receives the quotation result information, it uses the event identifiers to associate and store the quotation result information with the corresponding standardized event objects to form a traceable data record.

[0022] The state construction module constructs state vectors for decision-making based on standardized event objects and the business data associated with those standardized event objects. Furthermore, the state building module includes: Retrieve business data corresponding to standardized event objects; Construct a state vector for decision-making based on standardized event objects and business data; Based on a preset set of fields, the system performs field integrity checks on standardized event objects and business data, generates field missing identifiers, and incorporates these identifiers into the state vector.

[0023] Specifically, the state construction module constructs a state vector for decision-making based on standardized event objects and the business data associated with the standardized event objects, so that the subsequent trigger gating module, uncertainty estimation module and quotation agent module can perform calculations based on the same state caliber. After receiving the standardized event objects provided by the event access module, the state construction module first parses the event type and inquiry object identifier in them, and determines the range of business data and data source to be retrieved accordingly. The state construction module obtains business data corresponding to standardized event objects. In specific implementation, the state construction module can query records associated with the inquiry object in the business database based on the inquiry object identifier, including customer information, historical quotations and transaction records, basic product or material information, cost and supply information, delivery and capacity information, risk control and compliance model output, etc. At the same time, it can also supplement query conditions based on parameters carried in the event payload (such as quantity, delivery date, region, specification fields or file references) to avoid irrelevant data from entering the state. For example, when the standardized event object corresponds to the "new inquiry" event and the payload contains quantity and delivery date, the state construction module can pull business data such as the customer's level and historical transaction discounts, the cost baseline and the most recent cost update time of the product category, current inventory and capacity load, etc., as input for subsequent state construction. Subsequently, the state construction module constructs a state vector for decision-making based on standardized event objects and business data. That is, the state construction module uniformly encodes and concatenates the event type, event timestamp, and event payload fields in the standardized event objects, as well as the structured fields in the business data to form a state vector. Among them, enumeration mapping or embedding encoding can be used for category fields, unit unification and range pruning can be performed for numerical fields, time fields can be converted into time intervals or periodic features, and index identifiers can be retained for subsequent modules to process as needed. Continuing the above example, the state vector can include customer level, historical transaction price range, inquiry quantity, delivery days, cost baseline, inventory level, capacity load, and the most recent cost update time interval, so that subsequent modules can directly use the state vector for gating and action decisions. Meanwhile, the state construction module performs field integrity checks on standardized event objects and business data based on a preset field set, generates field missing identifiers, and incorporates these identifiers into the state vector. The module can pre-configure a set of fields related to pricing decisions, such as key fields for materials or specifications, quantity, delivery date, delivery location, customer level, cost baseline, inventory, and production capacity. When constructing the state vector, the state construction module checks each standardized event object and business data against the preset field set to see if it has the corresponding field or a valid parsable value, and writes the missing fields into the field missing identifier using binary or multi-value marking.

[0024] The gating module is triggered to perform gating decision calculations on the state vector to obtain the gating decision result and gating confidence. Furthermore, the trigger gating module includes: Gating decision calculation is performed based on state vectors to obtain gating decision results and gating confidence, where the gating decision results include triggering, delayed triggering, and no triggering; When the gating decision result is delayed triggering, the standardized event object is written to the delayed queue; When the preset callback conditions are met, the standardized event objects in the delay queue are resubmitted to the state construction module to update the state vector and allow the trigger gating module to perform gating decision calculations again.

[0025] Specifically, the trigger gating module is used to perform gating decision calculations on the state vectors output by the state construction module to determine whether to enter the quotation processing flow, and outputs the gating decision results and gating confidence scores for subsequent modules to perform corresponding processing. After receiving the state vector, the trigger gating module can input it into the gating decision model for inference calculation. The gating decision model can be a classification model or a scoring model, which is used to give confidence scores or probability outputs for the three gating states of "triggered, delayed trigger, and no trigger", thereby realizing gating control of the quotation processing. The triggering gating module performs gating decision calculations based on the state vector to obtain the gating decision result and gating confidence level. The gating decision result includes triggering, delayed triggering, and no triggering. In one embodiment, the gating decision model outputs corresponding confidence values ​​for the three gating states. The triggering gating module determines the gating state corresponding to the largest confidence value as the gating decision result and sets this largest confidence level as the gating confidence level. Accordingly, the gating confidence level can be expressed as: ; in, It is a state vector (constructed by the state building module). For gating decision models at input Time-to-gated state The output confidence level, Choose from "trigger, delayed trigger, or no trigger". For example, when the state vector corresponding to a "new inquiry" event shows that the inquiry information is relatively complete and the customer is a key customer, the gating decision model can output a high confidence level for "trigger". The trigger gating module outputs the gating decision result as trigger based on this, and at the same time outputs the corresponding gating confidence level for subsequent recording and traceability.

[0026] When the gating decision result is delayed triggering, the trigger gating module writes the standardized event object into the delay queue to temporarily suspend the quotation processing of the event and retain the entry for subsequent recalculation. In one embodiment, the delay queue can be implemented by queue storage or task table. The trigger gating module writes the standardized event object and its event identifier, as well as the timestamp of entering the delay queue, into the delay queue. For example, if there are missing keywords in the state vector of the quotation event or the external business data has not yet been prepared, the trigger gating module can output the gating decision result as delayed triggering and write the standardized event object into the delay queue so that the gating decision calculation can be performed after the information is supplemented. When the preset callback conditions are met, the gating module resubmits the standardized event objects in the delay queue to the state construction module to update the state vector and perform gating decision calculation again. In one embodiment, the preset callback conditions may consist of event-driven conditions and timeout conditions: when the supplementary information event associated with the standardized event object arrives, the cost or inventory update event arrives, or the delay time reaches the preset duration, the gating module retrieves the corresponding standardized event object from the delay queue and resubmits it to the state construction module.

[0027] Furthermore, when the gating module performs gating decision calculations, it determines the gating decision result based on threshold conditions, which include: The gate confidence level is not lower than the gate threshold, wherein the gate threshold is determined by the threshold determination rule based on the state vector; The uncertainty measure is no higher than the uncertainty threshold, which is determined by the threshold determination rule based on the state vector.

[0028] Specifically, when the gating module performs gating decision calculation, it determines the gating decision result based on threshold conditions. The threshold conditions include "gating confidence not lower than gating threshold" and "uncertainty measure not higher than uncertainty threshold". Both the gating threshold and the uncertainty threshold are determined by threshold determination rules based on the state vector. The threshold determination rules can be implemented by rule tables or regression models to give corresponding thresholds for different state vectors. Furthermore, the gating module is triggered to obtain the gating confidence level. With uncertainty measurement Then, comparisons are made according to the following threshold conditions: ; in, For state vector data, This is the gated confidence data (output by the trigger gated module). This is uncertainty measurement data (output by the uncertainty estimation module). The threshold determination rule is based on A defined gating threshold The threshold determination rule is based on The uncertainty threshold of the control; For example, when the state vector corresponding to a certain inquiry event indicates a high customer level and the missing field is marked as complete, the threshold determination rule gives a relatively low gate threshold and a relatively high uncertainty threshold. The trigger gate control module compares these and determines the gate control decision result as trigger. When the state vector indicates that the missing keyword field leads to a large uncertainty measure, the trigger gate control module can determine the gate control decision result as delayed trigger based on the uncertainty threshold comparison, and write the corresponding standardized event object into the delay queue to wait for the callback before making a decision.

[0029] The uncertainty estimation module estimates the uncertainty of the forecasts related to the price based on the state vector to obtain an uncertainty measure. Furthermore, the uncertainty estimation module includes: Uncertainty estimation of the forecast quantity related to the price based on the state vector; Uncertainty measures are obtained based on uncertainty estimation; An uncertainty metric is provided to the bidding agent module, which then uses the reinforcement learning policy model to determine the bidding action.

[0030] Specifically, the uncertainty estimation module performs uncertainty estimation on the predicted quantity related to the quotation based on the state vector. That is, the uncertainty estimation module uses the state vector as the model input, calls the estimation model used to perform interval prediction of the predicted quantity, and obtains the interval parameters or quantile parameters of the predicted quantity. The state vector may include inquiry load fields, customer and historical transaction fields, cost and supply fields, and field missing indicators, etc., so that the estimation model can give corresponding interval output under different information integrity conditions. For example, when the field missing indicator in the state vector corresponding to a certain "new inquiry" event indicates that there is a missing key specification field, the uncertainty estimation module outputs a wider interval parameter for the cost prediction quantity. The uncertainty estimation module obtains the uncertainty measure based on the uncertainty estimation. In specific implementation, the uncertainty estimation module converts the interval parameter into the uncertainty measure. For example, the uncertainty measure is determined by the difference between the upper and lower bounds of the interval. The uncertainty measure is stored or output in correspondence with the interval parameter of the predicted amount. Continuing the above example, after obtaining the upper and lower bounds of the cost prediction amount, the uncertainty estimation module determines the difference between the two as the uncertainty measure, which is used to characterize the interval span of the cost prediction amount in the current state. Simultaneously, the uncertainty estimation module provides the uncertainty metric to the pricing agent module. That is, the uncertainty estimation module binds the uncertainty metric to the corresponding state vector and outputs it. This allows the pricing agent module to use the state vector and uncertainty metric as input to the policy model when performing policy reasoning through the reinforcement learning policy model to determine the pricing action. For example, when the uncertainty metric is high, the pricing agent module can output the action of requesting supplementary information or the action of range pricing during policy reasoning; when the uncertainty metric is low, the pricing agent module can output the action of point pricing.

[0031] Furthermore, the uncertainty estimation module estimates uncertainty through quantile regression, including: Input the state vector into the quantile regression model to obtain multiple quantile outputs corresponding to the predicted values; The interval parameters for the predicted quantity are determined based on multiple quantile outputs; The uncertainty measure is determined based on the interval parameters and then provided to the pricing agent module.

[0032] Specifically, the uncertainty estimation module uses the state vector as input to the quantile regression model. The quantile regression model outputs predicted values ​​corresponding to multiple preset quantiles for the same predicted variable, thus obtaining multiple quantile outputs. Let the state vector be denoted as... Let the predicted quantity related to the price be denoted as . (For example, cost forecasts or transaction probability forecasts), the scorepoint set is The output of the quantile regression model for each quantile can be expressed as: ; in, The state vector data output by the state construction module. For the quantile regression model in the uncertainty estimation module, this is for the predicted quantity. At quantile For example, when the state vector corresponding to a "new inquiry" event includes quantity, delivery date, customer level, cost baseline and field missing identifier, the quantile regression model can output the predicted cost values ​​at the lower quantile and higher quantile respectively to characterize the cost range of the inquiry under the current information conditions. The uncertainty estimation module determines the interval parameters of the predicted quantity based on multiple quantile outputs, and selects the lower quantile from the multiple quantile outputs in the uncertainty estimation module. With upper quantile The corresponding predicted values ​​are used as interval endpoints to obtain the interval parameters of the predicted quantity. The interval parameters can be expressed as: ; in, For prediction Interval parameter data, and For the preset lower and upper quantiles, and These are the lower and upper bounds of the interval output by the quantile regression model, respectively. Continuing with the example above, if we select... and Corresponding to lower and higher quantiles respectively, the interval parameter can be used to represent the upper and lower bounds of the cost forecast. The uncertainty estimation module determines the uncertainty metric based on the interval parameter and provides the uncertainty metric to the bidding agent module. In specific implementation, the uncertainty estimation module can use the interval width as the uncertainty metric, which can be expressed as: ; in, The uncertainty measure data is obtained from the difference between the upper and lower bounds of the interval. The uncertainty estimation module binds the uncertainty measure to the corresponding state vector and outputs it. This allows the pricing agent module to use the uncertainty measure as one of the inputs to determine the pricing action when performing policy reasoning through the reinforcement learning policy model. For example, when the cost prediction interval for the inquiry is wide, it may lead to... When the range is large, the pricing agent module can choose to output a range pricing action or request supplementary information action during strategy inference; when the cost prediction range is narrow, it leads to... When the price is low, the pricing agent module can choose to output a price quotation action.

[0033] The bidding agent module determines the bidding action based on the state vector and uncertainty metric through a reinforcement learning policy model when the gating decision indicates that the bidding process is triggered. Furthermore, the quotation agent module includes: When the gating decision result indicates that the quotation processing is triggered, the state vector and uncertainty measure are used as inputs to the reinforcement learning policy model; The pricing action is determined by a reinforcement learning strategy model. The pricing action belongs to a preset action set, which includes point pricing action, range pricing action, tiered pricing action, request for supplementary information action, transfer to manual agent action, and delayed pricing action.

[0034] Specifically, when the gating decision result indicates that the quotation processing is triggered, the quotation agent module obtains the state vector corresponding to the quotation. and the uncertainty measure output by the uncertainty estimation module and will and The inputs are organized into a preset input format for the reinforcement learning policy model. ,in, The state vector data output by the state construction module. For uncertainty measurement data, This is the input data provided to the reinforcement learning policy model; Furthermore, the pricing agent module uses a reinforcement learning policy model on a preset action set. The process involves determining the bidding action, with a pre-defined set of actions including point pricing, range pricing, tiered pricing, requesting additional information, transferring to human assistance, and delaying the bidding process. For ease of description, the reinforcement learning policy model outputs the action selection result from the action set. The pricing agent module determines the pricing action based on this information: ; in, The data output by the strategy model. For quote action data, For example, when a certain inquiry corresponds to a set of preset actions... At higher levels, the strategy model can enable range quoting actions or requests for supplementary information actions to achieve higher outputs, thereby determining the corresponding quoting action. At lower levels, the strategy model can generate higher outputs for pricing actions, thereby determining the appropriate pricing action.

[0035] The action execution orchestration module performs the corresponding bidding business operations according to the bidding action, and generates operation records and bidding result information corresponding to the bidding business operations. Furthermore, the action execution orchestration module includes: When receiving a quotation action, a quotation processing identifier is generated based on the quotation object identifier contained in the standardized event object, and the quotation business operation is idempotently controlled according to the quotation processing identifier. When performing a quotation operation, an operation version identifier and validity period parameter are generated for the quotation operation, and the operation version identifier and validity period parameter are written into the operation record; The pricing business operations are orchestrated and routed based on the action type of the pricing action, so as to select the corresponding business processing channel to execute the pricing business operation from the automatic pricing channel, the supplementary information interaction channel, the manual processing channel, and the delayed processing channel.

[0036] Specifically, after receiving a quotation action, the action execution orchestration module performs the corresponding quotation business operation according to the quotation action, and generates an operation record and quotation result information corresponding to the business operation, so as to facilitate subsequent backflow association; When a quotation action is received, the action execution orchestration module reads the inquiry object identifier contained in the standardized event object to generate a quotation processing identifier, and performs idempotent control on the quotation business operation based on the quotation processing identifier. If the business operation corresponding to the same quotation processing identifier is detected to have been executed within the preset window, the existing operation record or quotation result information is reused. During the execution of a quotation business operation, the action execution orchestration module assigns an operation version identifier to the business operation and generates a validity period parameter. The operation version identifier and validity period parameter are written into the operation record to distinguish different business operations for the same inquiry object. The action execution orchestration module also orchestrates and routes the quotation business operations according to the action type of the quotation action. It selects the corresponding business processing channel from the automatic quotation channel, the supplementary information interaction channel, the manual processing channel, and the delayed processing channel to execute the quotation business operation. For example, when the quotation action is a request for supplementary information action, the business operation is routed to the supplementary information interaction channel to generate a supplementary information request, and the corresponding operation version identifier and validity period parameter are written to the business operation.

[0037] The result feedback module receives operation records and quotation result information, and stores the quotation result information in association with the state vector, gating decision result, gating confidence, uncertainty measure and quotation action.

[0038] Furthermore, the result feedback module includes: Receive operation records and quotation results information, and extract the event identifier, inquiry object identifier, and operation version identifier as association keys; Based on the association key, the bidding result information is associated with the state vector, gating decision result, gating confidence, uncertainty measure and bidding action to form a data sample; The model version identifier of the reinforcement learning strategy model is recorded for the data samples, so as to correspond the bidding action with the model version identifier.

[0039] Specifically, after receiving the operation record and quotation result information, the result feedback module collects the key data of this quotation processing to ensure that the quotation result information corresponds to the state and action in the previous decision-making process. In specific implementation, the result feedback module reads the state vector, gating decision result, gating confidence, uncertainty measure and quotation action corresponding to the quotation result information, and uses them as the data content to be fed back. After receiving the operation record and quotation result information, the result feedback module extracts the event identifier, the inquiry object identifier, and the operation version identifier as association keys. The event identifier and the inquiry object identifier are obtained from the standardized event object, and the operation version identifier is obtained from the operation record. The association key is used to identify the processing instances of the same inquiry object under different business operation versions. For example, the same inquiry object may generate two quotations before and after supplementing information. The result feedback module forms different association keys through different operation version identifiers to distinguish the two records. Based on the association key, the result feedback module associates and stores the quotation result information with the state vector, gating decision result, gating confidence, uncertainty measure and quotation action to form a data sample, and writes the data sample into the sample library or log library. Meanwhile, the result feedback module records the model version identifier of the reinforcement learning strategy model for the data sample. The model version identifier can be carried by the bidding agent module along with the bidding action or provided by the operation record, and is used to match the bidding action with the model version identifier.

[0040] Although embodiments of the invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made to these embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

Claims

1. A quotation system triggered by an intelligent agent based on reinforcement learning, characterized in that, include: The event access module receives event information related to the quotation business and performs standardized processing on the event information to form standardized event objects; The state construction module constructs a state vector for decision-making based on the standardized event object and the business data associated with the standardized event object. The gating module is triggered to perform gating decision calculation on the state vector to obtain the gating decision result and gating confidence. The uncertainty estimation module estimates the uncertainty of the predicted quantity related to the quotation based on the state vector to obtain an uncertainty measure. The pricing agent module determines the pricing action based on the state vector and the uncertainty measure when the gating decision result indicates that pricing processing is triggered; The action execution orchestration module performs the bidding business operation corresponding to the bidding action according to the bidding action, and generates the operation record and bidding result information corresponding to the bidding business operation; The result feedback module receives the operation record and the quotation result information, and stores the quotation result information in association with the state vector, the gating decision result, the gating confidence, the uncertainty measure, and the quotation action.

2. The agent-triggered quotation system based on reinforcement learning according to claim 1, characterized in that, The event access module includes: Receive event information related to quotation services; The event information is standardized to form a standardized event object, wherein the standardized event object includes event type, event timestamp, event payload and inquiry object identifier; An event identifier is generated based on the standardized event object. The event identifier is used to associate and store the standardized event object with the quotation result information in the result feedback module.

3. The agent-triggered quotation system based on reinforcement learning according to claim 1, characterized in that, The state construction module includes: Obtain the business data corresponding to the standardized event object; A state vector for decision-making is constructed based on the standardized event objects and the business data. Based on a preset set of fields, the standardized event object and the business data are judged for field integrity, a field missing identifier is generated, and the field missing identifier is incorporated into the state vector.

4. The agent-triggered quotation system based on reinforcement learning according to claim 1, characterized in that, The trigger gating module includes: Gating decision calculation is performed based on the state vector to obtain the gating decision result and gating confidence, wherein the gating decision result includes triggering, delayed triggering, and no triggering; When the gating decision result is delayed triggering, the standardized event object is written into the delay queue; When the preset callback conditions are met, the standardized event object in the delay queue is resubmitted to the state construction module to update the state vector and allow the trigger gating module to perform gating decision calculation again.

5. The agent-triggered quotation system based on reinforcement learning according to claim 4, characterized in that, When the trigger gating module performs the gating decision calculation, it determines the gating decision result based on threshold conditions, the threshold conditions including: The gate confidence level is not lower than the gate threshold, wherein the gate threshold is determined by the threshold determination rule based on the state vector; The uncertainty metric is not higher than the uncertainty threshold, wherein the uncertainty threshold is determined by the threshold determination rule based on the state vector.

6. The agent-triggered quotation system based on reinforcement learning according to claim 1, characterized in that, The uncertainty estimation module includes: Uncertainty estimation is performed on the predicted quantities related to the price based on the state vector; An uncertainty measure is obtained based on the aforementioned uncertainty estimate; The uncertainty metric is provided to the pricing agent module, which then uses the reinforcement learning policy model to determine the pricing action.

7. The agent-triggered quotation system based on reinforcement learning according to claim 6, characterized in that, The uncertainty estimation module estimates uncertainty through quantile regression, including: The state vector is input into the quantile regression model to obtain multiple quantile outputs corresponding to the predicted quantity; The interval parameters of the predicted quantity are determined based on the multiple quantile outputs. The uncertainty measure is determined based on the interval parameters, and the uncertainty measure is provided to the quotation agent module.

8. The agent-triggered quotation system based on reinforcement learning according to claim 1, characterized in that, The quotation agent module includes: When the gating decision result indicates that the quotation processing is triggered, the state vector and the uncertainty measure are used as inputs to the reinforcement learning policy model; The bidding action is determined by the reinforcement learning strategy model. The bidding action belongs to a preset action set, which includes point price bidding action, range bidding action, tiered bidding action, request for supplementary information action, transfer to manual bidding action, and delayed bidding action.

9. The agent-triggered quotation system based on reinforcement learning according to claim 1, characterized in that, The action execution orchestration module includes: Upon receiving the quotation action, a quotation processing identifier is generated based on the inquiry object identifier contained in the standardized event object, and the quotation business operation is idempotently controlled according to the quotation processing identifier. When performing the quotation operation, an operation version identifier and a validity period parameter are generated for the quotation operation, and the operation version identifier and the validity period parameter are written into the operation record; The pricing business operation is orchestrated and routed based on the action type of the pricing action, so as to select the corresponding business processing channel from the automatic pricing channel, supplementary information interaction channel, manual processing channel and delayed processing channel to execute the pricing business operation.

10. The agent-triggered quotation system based on reinforcement learning according to claim 9, characterized in that, The result feedback module includes: Receive the operation record and the quotation result information, and extract the event identifier, the inquiry object identifier, and the operation version identifier as association keys; Based on the association key, the bidding result information is associated and stored with the state vector, the gating decision result, the gating confidence, the uncertainty measure, and the bidding action to form a data sample; The model version identifier of the reinforcement learning policy model is recorded for the data sample, so as to correspond the bidding action with the model version identifier.