A quality evaluation method and device, a storage medium and an electronic device

By evaluating the quality of historical dialogue data and summary information to be evaluated across multiple assessment dimensions, comprehensive evaluation data is generated, which solves the problems of large bias and missing information in dialogue summary generation by large language models, and achieves high-accuracy and reliable quality evaluation.

CN122309730APending Publication Date: 2026-06-30BEIJING ZHONGKE JINDEZHU INTELLIGENT TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
BEIJING ZHONGKE JINDEZHU INTELLIGENT TECH CO LTD
Filing Date
2026-02-13
Publication Date
2026-06-30

AI Technical Summary

Technical Problem

Existing technologies using large language models to generate dialogue summaries suffer from problems such as large biases and missing information, resulting in inaccurate evaluation results and low reliability. These problems cannot be identified by rule templates or lightweight classification models.

Method used

By acquiring the summary information to be evaluated and its corresponding historical dialogue data, the target business information is determined and multiple evaluation dimensions are set, including factual consistency, information completeness, compliance and risk, to conduct a multi-dimensional quality assessment and generate comprehensive evaluation data.

Benefits of technology

It enables multi-dimensional and comprehensive quality inspection of dialogue summaries, improving the accuracy and reliability of the assessment and ensuring the accuracy and consistency of the assessment results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN122309730A_ABST
    Figure CN122309730A_ABST
Patent Text Reader

Abstract

This application discloses a quality assessment method, apparatus, storage medium, and electronic device, relating to the field of data processing technology. The method includes: acquiring summary information to be assessed and historical dialogue data; determining target business information corresponding to the historical dialogue data, and determining assessment importance data corresponding to the summary information to be assessed in multiple assessment dimensions based on the target business information; performing quality assessments on the historical dialogue data and the summary information to be assessed in multiple assessment dimensions to obtain multiple quality assessment data; and generating quality information of the summary information to be assessed based on the assessment importance data and the multiple quality assessment data. This application achieves comprehensive multi-dimensional quality detection of the summary information to be assessed by performing quality assessments on the historical dialogue data and the summary information to be assessed in the multiple assessment dimensions to obtain multiple quality assessment data corresponding to the summary information to be assessed, thereby improving the accuracy and reliability of the quality assessment.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of data processing technology, and in particular to a quality assessment method, apparatus, storage medium, and electronic device. Background Technology

[0002] In highly compliant businesses such as financial customer service, the quality assessment of dialogue summaries is a crucial step in conducting automated quality inspections and risk monitoring.

[0003] Currently, large language models (LMs) are used to extract dialogue summaries between customers and customer service representatives. The quality of the generated dialogue summaries is then assessed based on rule templates or by using a lightweight classification model. The quality assessment based on rule templates involves text matching between the summary and the dialogue according to preset rules. The quality assessment using a lightweight classification model involves extracting text features from the summary and the dialogue and performing feature matching.

[0004] However, dialogue summaries generated by large language models often suffer from large biases and missing information. Relying on rule templates or using lightweight classification models cannot identify these issues, leading to inaccurate and unreliable evaluation results. Summary of the Invention

[0005] In view of this, this application provides a quality assessment method, apparatus, storage medium, and electronic device. The main purpose is to improve the technical problems in the prior art where dialogue summaries generated by large language models have large biases and missing information, and where the large biases and missing information of dialogue summaries generated by large models cannot be identified by rule templates or lightweight classification models, resulting in inaccurate and unreliable assessment results.

[0006] Firstly, this application provides a quality assessment method, including: Obtain the summary information to be evaluated and the historical dialogue data corresponding to the summary information to be evaluated, wherein the historical dialogue data is the dialogue data between the user and the financial business customer service based on the target financial business; Determine the target business information corresponding to the historical dialogue data, and determine the evaluation importance data corresponding to the summary information to be evaluated in multiple evaluation dimensions based on the target business information; The historical dialogue data and the summary information to be evaluated are evaluated in multiple evaluation dimensions respectively to obtain multiple quality evaluation data corresponding to the summary information to be evaluated. Based on the assessment importance data and the multiple quality assessment data, quality information of the summary information to be assessed is generated.

[0007] Optionally, determining the target business information corresponding to the historical dialogue data, and determining the evaluation importance data corresponding to the summary information to be evaluated in multiple evaluation dimensions based on the target business information, includes: Extract target business information from the historical dialogue data, wherein the target business information includes at least one of business content, business type, business compliance information, and business risk level; Based on the target business information, multiple evaluation dimensions are determined for evaluating the summary information to be evaluated; Determine the assessment importance data corresponding to each of the multiple assessment dimensions.

[0008] Optionally, the step of determining multiple evaluation dimensions for evaluating the summary information to be evaluated based on the target business information includes: Based on the business content in the target business information, a first evaluation dimension is determined for evaluating the summary information to be evaluated. A second evaluation dimension is determined based on the business type in the target business information to evaluate the summary information to be evaluated. A third evaluation dimension is determined based on the business compliance information in the target business information to evaluate the summary information to be evaluated; The fourth evaluation dimension for evaluating the summary information to be evaluated is determined based on the business risk level in the target business information.

[0009] Optionally, the quality assessment of the historical dialogue data and the summary information to be assessed is performed on the multiple assessment dimensions respectively to obtain multiple quality assessment data corresponding to the summary information to be assessed, including: The consistency of the historical dialogue data and the summary information to be evaluated is evaluated in the first evaluation dimension to obtain the first evaluation data of the summary information to be evaluated. The historical dialogue data and the summary information to be evaluated are evaluated for completeness in the second evaluation dimension to obtain the second evaluation data of the summary information to be evaluated. The historical dialogue data and the summary information to be evaluated are evaluated for compliance in the third evaluation dimension to obtain the third evaluation data of the summary information to be evaluated. The historical dialogue data and the summary information to be evaluated are assessed for risk in the fourth evaluation dimension to obtain the fourth evaluation data of the summary information to be evaluated.

[0010] Optionally, generating quality information for the summary information to be evaluated based on the assessment importance data and the plurality of quality assessment data includes: Based on the assessment importance data, the weights of the first assessment data, the second assessment data, the third assessment data, and the fourth assessment data are determined; The multiple quality assessment data are weighted and fused according to the weights to obtain the comprehensive assessment data of the dialogue summary; Based on the comprehensive evaluation data, quality information of the summary information to be evaluated is generated.

[0011] Optionally, the method further includes: Obtain multiple summary information to be evaluated and generate a sample set of summary information to be evaluated; Determine the tag information for each summary information to be evaluated in the sample set of summary information to be evaluated, wherein the tag information includes acceptable tags and unacceptable tags; The training summary information is obtained by filtering the summary information to be evaluated from the sample set of summary information to be evaluated according to the label information. The training summary information and the corresponding label information are input into the initial quality inspection model for model training to obtain a multi-dimensional quality inspection model. The quality of the multiple summary information to be evaluated is assessed based on the multi-dimensional quality inspection model.

[0012] Secondly, this application provides a quality assessment device, comprising: The acquisition module is configured to acquire the summary information to be evaluated and the historical dialogue data corresponding to the summary information to be evaluated, wherein the historical dialogue data is the dialogue data between the user and the financial business customer service based on the target financial business. The determination module is configured to determine the target business information corresponding to the historical dialogue data, and to determine the evaluation importance data corresponding to the summary information to be evaluated in multiple evaluation dimensions based on the target business information; The evaluation module is configured to perform quality evaluations on the historical dialogue data and the summary information to be evaluated on the multiple evaluation dimensions respectively, and obtain multiple quality evaluation data corresponding to the summary information to be evaluated. The generation module is configured to generate quality information of the summary information to be evaluated based on the assessment importance data and the plurality of quality assessment data.

[0013] Thirdly, this application provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the quality assessment method described in the first aspect.

[0014] Fourthly, this application provides an electronic device, including a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, wherein the processor executes the computer program to implement the quality assessment method described in the first aspect.

[0015] By employing the above technical solutions, this application provides a quality assessment method, apparatus, storage medium, and electronic device. This application acquires summary information to be assessed and corresponding historical dialogue data, where the historical dialogue data is dialogue data between a user and a financial service customer service representative based on a target financial business. It then determines the target business information corresponding to the historical dialogue data and, based on the target business information, determines the assessment importance data corresponding to the summary information to be assessed across multiple assessment dimensions. It performs quality assessments on the historical dialogue data and the summary information to be assessed across multiple assessment dimensions, obtaining multiple quality assessment data corresponding to the summary information to be assessed. Based on the assessment importance data and the multiple quality assessment data, it generates quality information for the summary information to be assessed. Compared with existing technologies, this application achieves comprehensive multi-dimensional quality detection of the summary information to be assessed by performing quality assessments on the historical dialogue data and the summary information to be assessed across multiple assessment dimensions, and improves the accuracy and reliability of quality assessment by generating quality information for the summary information to be assessed based on the assessment importance data and the multiple quality assessment data. Attached Figure Description

[0016] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.

[0017] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0018] Figure 1 A flowchart illustrating a quality assessment method provided in an embodiment of this application is shown; Figure 2 A flowchart illustrating a quality assessment method provided in an embodiment of this application is shown; Figure 3 The illustration shows a flowchart of an example of optimizing a customer service dialogue summary quality assessment standard based on KTO alignment training, as provided in an embodiment of this application. Figure 4 This paper shows a schematic diagram of the structure of a quality assessment device provided in an embodiment of this application; Figure 5 A schematic diagram of the structure of an electronic device provided in an embodiment of this application is shown. Detailed Implementation

[0019] The embodiments of this application will now be described in more detail with reference to the accompanying drawings. It should be noted that, unless otherwise specified, the embodiments and features described herein can be combined with each other.

[0020] To address the issues of large biases and missing information in dialogue summaries generated by large language models in current technologies, and the inability of rule-based templates or lightweight classification models to recognize these problems, which lead to inaccurate and unreliable evaluation results, this embodiment provides a quality assessment method, such as... Figure 1 As shown, the method includes: Step 101: Obtain the summary information to be evaluated and the historical dialogue data corresponding to the summary information to be evaluated.

[0021] Among them, historical dialogue data is the dialogue data between users and financial service customer service based on the target financial business.

[0022] In this embodiment, the summary information to be evaluated can be summary content generated based on the original dialogue between the user and the financial service customer service, which requires quality testing. For example, the summary information to be evaluated in this embodiment may specifically include dialogue summaries generated after the financial customer service communicates with the user about overdue repayment, dialogue summaries generated after the user inquires about the application process for financial products, dialogue summaries generated after the user applies for account loss reporting, and other customer service dialogue summaries in various financial business scenarios.

[0023] In this embodiment, historical dialogue data can be the original dialogue records generated between the user and financial service customer service during the communication process of conducting target financial business. Historical dialogue data can be used to provide a raw data benchmark for the quality assessment of the summary information to be evaluated. For example, the historical dialogue data in this embodiment may specifically include voice-to-text dialogue records between customer service and user, chat records of online text communication, interaction records between intelligent customer service and user, etc. The recorded content may include dialogue-related information such as the user's business requests, the customer service's responses and processing actions, the communication results between the two parties, and key node information of business processing.

[0024] In the embodiments of this application, the target financial business can be a financial-related service business in which the user communicates with the financial business customer service, and the target financial business can be used to determine the scope of business scenarios for historical dialogue data and summary information to be evaluated.

[0025] Step 102: Determine the target business information corresponding to the historical dialogue data, and determine the assessment importance data corresponding to the summary information to be evaluated in multiple assessment dimensions based on the target business information.

[0026] In the embodiments of this application, the target business information can be information extracted from historical dialogue data that can characterize the core attributes of the target financial business. For example, the target business information in the embodiments of this application may specifically include business content, business type, business compliance information, and business risk level. The business content may be the specific business matters communicated between the user and customer service; the business type may be specific business categories such as overdue collection or product consultation; the business compliance information may be the financial industry compliance requirements and customer service script specifications corresponding to the business; and the business risk level may be a high-risk, medium-risk, or low-risk outcome classified according to business characteristics.

[0027] In this embodiment, the evaluation dimension can be a quality evaluation dimension set for the summary information to be evaluated. The evaluation dimension can be used to conduct targeted detection of the quality of the summary information from different aspects. For example, the evaluation dimension in this embodiment can be combined with the high compliance requirements of financial customer service business to set multiple core evaluation dimensions such as factual consistency, information integrity, compliance, and risk, and each evaluation dimension is independent of each other.

[0028] In the embodiments of this application, the assessment importance data can be data characterizing the importance of multiple assessment dimensions in the assessment of the information quality of the abstract to be assessed. The assessment importance data can be used to reflect the influence weight of different assessment dimensions on the overall quality of the abstract. For example, the assessment importance data in the embodiments of this application may specifically include data such as the priority ranking, weight coefficient, and score ratio of each assessment dimension.

[0029] In this embodiment of the application, natural language processing technology can be used to perform semantic analysis and information extraction on historical dialogue data to mine and obtain corresponding target business information from the original dialogue records. Combining the compliance management standards and risk control requirements of the target financial business, and based on the business characteristics of the extracted target business information, multiple evaluation dimensions suitable for the business scenario are determined to ensure that the evaluation dimensions can accurately cover the quality inspection needs of the business. Based on the business characteristics, risk level, and compliance requirements of the target business information, corresponding evaluation importance data is configured for each evaluation dimension to ensure that the evaluation importance data matches the core needs of the business scenario.

[0030] Step 103: Perform quality assessments on the historical dialogue data and the summary information to be assessed across multiple evaluation dimensions to obtain multiple quality assessment data corresponding to the summary information to be assessed.

[0031] In this embodiment, the quality assessment data can be a quantitative or qualitative assessment result obtained by comparing and evaluating historical dialogue data and summary information to be assessed under multiple assessment dimensions. For example, the quality assessment data in this embodiment may specifically include a 0-10 point quantitative score for each assessment dimension, conformity / non-conformity quality judgment results, specific problem location information, etc. For example, the quality assessment data for the factual consistency dimension can be an 8-point quantitative score, or a qualitative judgment result indicating factual deviations with specific deviation details marked.

[0032] In this embodiment, corresponding quality assessment rules and judgment standards can be established by combining the financial business requirements and industry standards corresponding to the target business information; the summary information to be assessed and historical dialogue data are compared and analyzed and quality tested under each assessment dimension, and the testing work of each dimension is carried out independently to avoid cross-dimensional judgment interference; the test results under each assessment dimension are standardized and quantitatively or qualitatively processed to obtain the quality assessment data of the summary information to be assessed corresponding to each assessment dimension.

[0033] Step 104: Based on the assessment importance data and multiple quality assessment data, generate quality information for the summary information to be assessed.

[0034] In this embodiment, the quality information can be information that comprehensively characterizes the overall quality level of the summary information to be evaluated, obtained by integrating multi-dimensional quality assessment data and assessment importance data. For example, the quality information in this embodiment may specifically include the comprehensive assessment score, overall quality level, summary of specific quality issues, and targeted rectification suggestions for the summary information to be evaluated.

[0035] In this embodiment, the weight allocation rule for each quality assessment data in the comprehensive quality assessment can be determined based on the assessment importance data corresponding to each assessment dimension. The weight allocation is positively correlated with the importance of the assessment dimension. Multiple quality assessment data are fused according to this rule, and the fusion method can be flexibly selected according to business needs. The fused results are systematically organized and processed, and combined with quality level classification standards, problem labeling specifications, etc., to generate quality information that reflects the overall quality of the summary information to be assessed.

[0036] Compared with existing technologies, the embodiments of this application perform quality assessments on historical dialogue data and summary information to be assessed in multiple evaluation dimensions to obtain multiple quality assessment data corresponding to the summary information to be assessed, thereby achieving comprehensive quality detection of the summary information to be assessed in multiple dimensions; and by generating quality information of the summary information to be assessed based on the assessment importance data and multiple quality assessment data, the accuracy and reliability of quality assessment are improved.

[0037] As an optional approach, when performing the task of "determining the target business information corresponding to historical dialogue data, and determining the assessment importance data corresponding to the summary information to be evaluated in multiple assessment dimensions based on the target business information," the following methods can be used, but are not limited to them: Figure 2 As shown, it includes: Step 201: Extract the target business information from the historical dialogue data. The target business information includes business content, business type, business compliance information, and business risk level.

[0038] For the embodiments of this application, the extraction of target business information from historical dialogue data can employ information extraction techniques in the field of natural language processing, including but not limited to algorithms such as named entity recognition, keyword extraction, semantic clustering, and text classification, to perform in-depth semantic analysis and information mining on historical dialogue data. During the analysis process, a domain lexicon, compliance rule library, and risk level classification library for financial customer service business can be constructed. Based on the domain knowledge base, the business content, business type, business compliance information, and business risk level in historical dialogue data can be identified and extracted to ensure that the extracted target business information is complete, accurate, and consistent with the actual characteristics of the target financial business.

[0039] For example, if the historical dialogue data is dialogue data generated by overdue collection business, the business content can be extracted from the historical dialogue data as communication with users about overdue repayment, the business type as financial loan overdue collection business, the business compliance information as compliance scripts and operational requirements for financial collection that need to be followed, and the business risk level as high risk.

[0040] Step 202: Based on the target business information, determine multiple evaluation dimensions for evaluating the summary information to be evaluated.

[0041] In the embodiments of this application, based on the target business information, multiple evaluation dimensions for evaluating the summary information to be evaluated can be determined. This can be combined with the high compliance and high risk requirements of financial customer service business. Based on the extracted target business information, a structured multi-dimensional quality inspection and evaluation system can be constructed. Each evaluation dimension in this system corresponds one-to-one with the elements of the target business information. Each evaluation dimension is independent of each other, which can realize the detection method of dimension decoupling and parallel evaluation, and can avoid cross-dimensional judgment interference.

[0042] Step 203: Determine the assessment importance data corresponding to each of the multiple assessment dimensions.

[0043] In this application embodiment, the assessment importance data corresponding to multiple assessment dimensions can be determined by combining the industry compliance management standards of the target financial business and the risk control requirements of the enterprise. Based on the business risk level and the strictness of compliance requirements of the target business information, the multiple assessment dimensions are classified according to their importance.

[0044] For example, for assessment dimensions corresponding to business characteristics with high risk levels and strict compliance requirements, higher assessment importance data can be configured. For instance, in high-risk financial businesses, the assessment importance data for compliance and risk dimensions is significantly higher than that for factual consistency and information completeness dimensions. For assessment dimensions corresponding to business characteristics with low risk levels and moderate compliance requirements, assessment importance data adapted to the business scenario can be configured. The representation of assessment importance data can include, but is not limited to, priority ranking, weight coefficients, and scoring percentages for each dimension. After configuration, the assessment importance data can be associated and stored with the corresponding assessment dimensions.

[0045] Optionally, when performing the step of "determining multiple evaluation dimensions for evaluating the summary information to be evaluated based on the target business information", the following methods may be used, but are not limited to: determining the first evaluation dimension for evaluating the summary information to be evaluated based on the business content in the target business information; determining the second evaluation dimension for evaluating the summary information to be evaluated based on the business type in the target business information; determining the third evaluation dimension for evaluating the summary information to be evaluated based on the business compliance information in the target business information; and determining the fourth evaluation dimension for evaluating the summary information to be evaluated based on the business risk level in the target business information.

[0046] In the embodiments of this application, the first evaluation dimension determined based on the business content in the target business information can be the factual consistency dimension. The factual consistency dimension can be used to evaluate whether the summary information to be evaluated truly restores the core business content in the historical dialogue data, ensuring that the summary and the business content of the original dialogue are highly consistent.

[0047] In the embodiments of this application, the second evaluation dimension determined based on the business type in the target business information can be the information integrity dimension. The information integrity dimension can be used to check whether the summary information to be evaluated fully covers the key business elements in the historical dialogue data that match the business type, so as to avoid the impact of the missing key elements on subsequent business processing decisions.

[0048] In this embodiment of the application, the third assessment dimension determined based on the business compliance information in the target business information can specifically be a compliance dimension. The compliance dimension can be used to review whether the summary information to be assessed meets the financial industry compliance requirements corresponding to the business compliance information, so as to ensure the security and compliance of the summary.

[0049] In this embodiment of the application, the fourth assessment dimension determined based on the business risk level in the target business information can be a risk dimension. This dimension can be used to assess whether the summary information to be assessed accurately reflects the business risk points in the historical dialogue data, whether there are business risks caused by improper summary expression or information deviation, and can also be combined with the level of business risk to detect whether the summary meets the corresponding risk prevention and control requirements.

[0050] Optionally, when performing the task of "conducting quality assessments on historical dialogue data and summary information to be assessed across multiple assessment dimensions to obtain multiple quality assessment data corresponding to the summary information to be assessed," the following method may be used, but is not limited to: performing a consistency assessment on historical dialogue data and summary information to be assessed across a first assessment dimension to obtain first assessment data for the summary information to be assessed; performing an integrity assessment on historical dialogue data and summary information to be assessed across a second assessment dimension to obtain second assessment data for the summary information to be assessed; performing a compliance assessment on historical dialogue data and summary information to be assessed across a third assessment dimension to obtain third assessment data for the summary information to be assessed; and performing a risk assessment on historical dialogue data and summary information to be assessed across a fourth assessment dimension to obtain fourth assessment data for the summary information to be assessed.

[0051] In the embodiments of this application, the consistency assessment of historical dialogue data and summary information to be assessed is performed in the first assessment dimension. The first assessment data of the summary information to be assessed can be obtained by using semantic vector representation and element-level semantic vector matching. The summary information to be assessed and historical dialogue data are respectively input into the semantic coding model to be converted into refined semantic vectors. Then, the semantic vectors are compared at the fact element level. At the same time, the thinking chain reasoning analysis method can be combined to guide the assessment model to explicitly output intermediate reasoning steps to obtain the first assessment data. The first assessment data may include the consistency quantitative score and specific problem location information.

[0052] In this application embodiment, the completeness of historical dialogue data and summary information to be evaluated is evaluated in the second evaluation dimension. The second evaluation data of the summary information to be evaluated can be obtained by performing business element coverage detection on the summary information to be evaluated based on the list of key business elements in the historical dialogue data that match the business type, and statistically analyzing the mention of key elements. The second evaluation data can include a quantitative score for business element coverage and specific annotations of missing elements.

[0053] In this embodiment of the application, compliance assessment is performed on historical dialogue data and summary information to be assessed in the third assessment dimension. The third assessment data of the summary information to be assessed can be obtained by conducting compliance review on the summary information to be assessed based on the financial compliance requirements corresponding to the business compliance information and the preset compliance detection rules. The third assessment data may include compliance quantitative scores and compliance issue annotations.

[0054] In this embodiment of the application, a risk assessment is performed on the historical dialogue data and the summary information to be assessed in the fourth assessment dimension. The fourth assessment data of the summary information to be assessed can be obtained by identifying the business risk points that may be caused in the summary information to be assessed according to the risk prevention and control requirements corresponding to the business risk level. The fourth assessment data may include a quantitative score of risk assessment and specific annotation of risk points.

[0055] Optionally, when performing the step of "generating quality information of the summary information to be evaluated based on the assessment importance data and multiple quality assessment data", the following method can be used, but is not limited to: determining the weights of the first assessment data, the second assessment data, the third assessment data, and the fourth assessment data based on the assessment importance data; performing weighted fusion processing on the multiple quality assessment data according to the weights to obtain the comprehensive assessment data of the dialogue summary; and generating the quality information of the summary information to be evaluated based on the comprehensive assessment data.

[0056] For the embodiments of this application, the priority and weight coefficient reference values ​​of each dimension included in the importance assessment data can be directly used as the initial weights of each assessment data, or the initial weights can be fine-tuned by combining the real-time business characteristics of the target business information.

[0057] In this embodiment of the application, multiple quality assessment data are weighted and fused according to their weights. The fusion method may include, but is not limited to, algorithms such as weighted summation and weighted average, to calculate the comprehensive assessment data of the summary information to be assessed. The comprehensive assessment data may be a comprehensive quantitative score of 0-100 points.

[0058] Optionally, but not limited to, the following methods may be used: acquiring multiple summary information to be evaluated and generating a sample set of summary information to be evaluated; determining the label information of each summary information to be evaluated in the sample set of summary information to be evaluated, the label information including acceptable labels and unacceptable labels; filtering the summary information to be evaluated in the sample set of summary information to be evaluated according to the label information to obtain training summary information; inputting the training summary information and the corresponding label information into the initial quality inspection model for model training to obtain a multi-dimensional quality inspection model; and performing quality evaluation on multiple summary information to be evaluated based on the multi-dimensional quality inspection model.

[0059] In this embodiment, the sample set of summary information to be evaluated may include difficult samples such as fuzzy summary information and implicit objections in the financial customer service business scenario. The label information of each summary information to be evaluated in the sample set can be determined by professionals in the financial business field based on the multi-dimensional quality inspection and evaluation system and business quality standards constructed in this application. The quality of each summary information to be evaluated in the sample set is evaluated to determine whether the summary information to be evaluated meets the quality requirements of various dimensions such as factual consistency, information integrity, compliance, and risk. The summary information to be evaluated that meets all quality requirements is labeled as acceptable, and the summary information to be evaluated that does not meet the quality requirements is labeled as unacceptable, forming single-sample binary labeled data suitable for Kahneman-Tversky Optimization (KTO) alignment training.

[0060] In this embodiment of the application, the training summary information is obtained by filtering the summary information to be evaluated from the sample set of summary information to be evaluated. This can be done by filtering according to the ratio of the number of samples with desirable labels to those with undesirable labels. The relationship between the number of samples with desirable labels and those with undesirable labels can be shown in Formula 1, where, It can represent the total number of samples. It can represent the number of available samples. It can represent the number of undesirable samples. It can represent the adjustment coefficient of quantity ratio. For example, The value can be 0.5, which means the ratio of acceptable to unacceptable samples is 1:2.

[0061] (Formula 1) For the embodiments of this application, the process of inputting training summary information and corresponding label information into the initial quality inspection model for training can employ the KTO optimization algorithm. Specifically, the training process may include: defining a log probability gain relative to the reference model as the reward; the formula for the log probability gain is shown in Formula 2, where... This can represent the currently trained model. This can represent the initial quality inspection model. It can represent the logarithmic probability gain. The larger the value, the higher the accuracy of the current model compared to the initial quality inspection model.

[0062] (Formula 2) For the embodiments of this application, a KTO loss function can be constructed, and the calculation formula of the loss function can be as shown in Formula 3, wherein, It can represent the weight of the available samples. It can represent the weight of undesirable samples, and k can represent label information (k=0 represents undesirable labels, k=1 represents acceptable labels). The temperature coefficient can represent the intensity of the optimization control, and r can represent the logarithmic probability gain. It can represent the sigmoid function. It can be used to transmit the reward value of the logarithmic probability gain to a probability value that can be used for loss calculation.

[0063] (Formula 3) In this embodiment, the initial quality inspection model can output high confidence for samples with acceptable labels, low confidence for samples with unacceptable labels, and penalize models that output errors. The model parameters are iteratively optimized through backpropagation until the model meets the preset convergence conditions. The multi-dimensional quality inspection model obtained after training is an expert quality inspection model with strong reasoning and judgment capabilities.

[0064] Optionally, convergence conditions may include the loss value falling below a preset threshold, the number of iterations reaching a preset maximum, and the evaluation metrics on the validation set being stably met; for highly compliant financial businesses, the recall rate of high-risk samples can also be used as an auxiliary convergence condition.

[0065] In this embodiment of the application, quality assessment based on a multi-dimensional quality inspection model can be achieved by inputting the summary information to be evaluated and the corresponding historical dialogue data into the multi-dimensional quality inspection model. The multi-dimensional quality inspection model can automatically score or judge the summary in multiple dimensions in parallel, and output the thought chain reasoning analysis process and fine-grained score for each dimension respectively. The model summarizes and organizes the scores of each dimension according to preset weights into the final comprehensive quality inspection score, thereby achieving efficient and accurate assessment of the summary quality.

[0066] As an optional approach, this application also provides an example of optimizing a customer service dialogue summary quality assessment standard based on KTO alignment training. The flowchart of this example is shown below. Figure 3 As shown, Figure 3 The specific steps may include: Step 1: Input the dialogue summary and historical dialogues into the expert quality inspection model, which is trained using KTO alignment. Step 2: The expert quality inspection model conducts a quality inspection and evaluation of the dialogue summary from four dimensions: implementation consistency, information completeness, output standardization, and security compliance. Step 3: Summarize the quality inspection scores of the expert quality inspection model for published books.

[0067] Compared with existing technologies, the embodiments of this application obtain quality assessment data for each dimension by performing corresponding quality assessments on historical dialogue data and summary information to be assessed in each assessment dimension, thereby achieving multi-dimensional quality assessment of summary information to be assessed and improving the reliability of quality assessment; by determining the weight of each quality assessment data based on assessment importance data and weighted fusion, quality information is generated again, thereby improving the accuracy of quality assessment; and by training a multi-dimensional assessment model and using it for quality assessment, the consistency of quality assessment is improved.

[0068] Furthermore, as Figure 1 and Figure 2 The specific implementation of the method shown in this embodiment provides a quality assessment device, such as... Figure 4 As shown, the device includes: an acquisition module 31, a determination module 32, an evaluation module 33, and a generation module 34.

[0069] The acquisition module 31 is configured to acquire the summary information to be evaluated and the historical dialogue data corresponding to the summary information to be evaluated. The historical dialogue data is the dialogue data between the user and the financial business customer service based on the target financial business. The determination module 32 is configured to determine the target business information corresponding to the historical dialogue data, and to determine the evaluation importance data corresponding to the summary information to be evaluated in multiple evaluation dimensions based on the target business information; The evaluation module 33 is configured to perform quality evaluation on historical dialogue data and summary information to be evaluated in multiple evaluation dimensions, and obtain multiple quality evaluation data corresponding to the summary information to be evaluated. The generation module 34 is configured to generate quality information of the summary information to be evaluated based on the assessment importance data and multiple quality assessment data.

[0070] In some examples of this embodiment, the determining module 32 is specifically configured to extract target business information from historical dialogue data, including business content, business type, business compliance information, and business risk level; based on the target business information, determine multiple evaluation dimensions for evaluating the summary information to be evaluated; and determine the evaluation importance data corresponding to each of the multiple evaluation dimensions.

[0071] In some examples of this embodiment, the determining module 33 is further configured to determine a first evaluation dimension for evaluating the summary information to be evaluated based on the business content in the target business information; determine a second evaluation dimension for evaluating the summary information to be evaluated based on the business type in the target business information; determine a third evaluation dimension for evaluating the summary information to be evaluated based on the business compliance information in the target business information; and determine a fourth evaluation dimension for evaluating the summary information to be evaluated based on the business risk level in the target business information.

[0072] In some examples of this embodiment, the evaluation module 33 is specifically configured to perform a consistency evaluation on the historical dialogue data and the summary information to be evaluated in a first evaluation dimension to obtain first evaluation data of the summary information to be evaluated; perform an integrity evaluation on the historical dialogue data and the summary information to be evaluated in a second evaluation dimension to obtain second evaluation data of the summary information to be evaluated; perform a compliance evaluation on the historical dialogue data and the summary information to be evaluated in a third evaluation dimension to obtain third evaluation data of the summary information to be evaluated; and perform a risk evaluation on the historical dialogue data and the summary information to be evaluated in a fourth evaluation dimension to obtain fourth evaluation data of the summary information to be evaluated.

[0073] In some examples of this embodiment, the generation module 34 is specifically configured to determine the weights of the first evaluation data, the second evaluation data, the third evaluation data, and the fourth evaluation data based on the evaluation importance data; perform weighted fusion processing on the multiple quality evaluation data according to the weights to obtain the comprehensive evaluation data of the dialogue summary; and generate the quality information of the summary information to be evaluated based on the comprehensive evaluation data.

[0074] In some examples of this embodiment, the evaluation module 33 is further configured to: acquire multiple summary information to be evaluated and generate a sample set of summary information to be evaluated; determine the label information of each summary information to be evaluated in the sample set of summary information to be evaluated, the label information including acceptable labels and unacceptable labels; filter the summary information to be evaluated in the sample set of summary information to be evaluated according to the label information to obtain training summary information; input the training summary information and the corresponding label information into the initial quality inspection model for model training to obtain a multi-dimensional quality inspection model; and perform quality evaluation on multiple summary information to be evaluated based on the multi-dimensional quality inspection model.

[0075] It should be noted that other corresponding descriptions of the functional units involved in the quality assessment device provided in this embodiment can be found in [reference needed]. Figure 1 and Figure 2 The corresponding descriptions in [the document] will not be repeated here.

[0076] Based on the above, Figure 1 and Figure 2 Accordingly, this embodiment also provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the above-described method. Figure 1 and Figure 2 The method shown.

[0077] Based on this understanding, the technical solution of this application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (such as CD-ROM, USB flash drive, mobile hard drive, etc.) and includes several instructions to cause a computer device (such as personal computer, server, or network device, etc.) to execute the methods of various implementation scenarios of this application.

[0078] like Figure 5 The diagram shown is a hardware structure schematic of an electronic device according to the present invention, comprising: At least one processor 401; and, Memory 402 is communicatively connected to at least one processor 401; wherein, The memory 402 stores instructions that can be executed by at least one processor to enable the at least one processor to perform the quality assessment method as described above.

[0079] Figure 5 Take a processor 401 as an example.

[0080] The electronic device may also include an input device 403 and an output device 404.

[0081] The processor 401, memory 402, input device 403, and output device 404 can be connected via a bus or other means. Figure 5 Taking the example of a connection between China and Israel via a bus.

[0082] Memory 402, as a non-volatile computer-readable storage medium, can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as the program instructions / modules corresponding to the quality assessment method in the embodiments of this application, for example, Figure 1 and Figure 2 The method flow is shown. The processor 401 executes various functional applications and communications by running non-volatile software programs, instructions, and modules stored in the memory 402, thereby implementing the quality assessment method in the above embodiments.

[0083] Memory 402 may include a program storage area and a data storage area, wherein the program storage area may store the operating system and applications required for at least one function; the data storage area may store data created according to the use of the quality assessment method, etc. Furthermore, memory 402 may include high-speed random access memory and may also include non-volatile memory, such as at least one disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 402 may optionally include memory remotely located relative to processor 401, and these remote memories may be connected via a network to the apparatus performing the quality assessment method. Examples of such networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.

[0084] The input device 403 can receive user clicks and generate signal inputs related to user settings and function controls for quality assessment methods. The output device 403 may include a display device such as a screen.

[0085] One or more modules are stored in memory 402, and when run by one or more processors 401, the quality assessment method in any of the above method embodiments is executed.

[0086] Optionally, the aforementioned physical devices may also include a user interface, a network interface, a camera, radio frequency (RF) circuitry, sensors, audio circuitry, a Wi-Fi module, etc. The user interface may include a display screen, input units such as a keyboard, etc., and optional user interfaces may also include USB interfaces, card reader interfaces, etc. The network interface may optionally include standard wired interfaces, wireless interfaces (such as Wi-Fi interfaces), etc.

[0087] Those skilled in the art will understand that the physical device structure provided in this embodiment does not constitute a limitation on the physical device, and may include more or fewer components, or combine certain components, or have different component arrangements.

[0088] The storage medium may also include an operating system and a network communication module. The operating system is a program that manages the hardware and software resources of the aforementioned physical device, supporting the operation of information processing programs and other software and / or programs. The network communication module is used to enable communication between the various components within the storage medium, as well as communication with other hardware and software in the information processing physical device.

[0089] Through the above description of the embodiments, those skilled in the art can clearly understand that this application can be implemented using software plus necessary general-purpose hardware platforms, or it can be implemented using hardware. By applying the scheme of this embodiment, compared with the prior art, this application embodiment achieves comprehensive multi-dimensional quality detection of the summary information to be evaluated by performing quality assessments on historical dialogue data and summary information to be evaluated in multiple evaluation dimensions, obtaining multiple quality assessment data corresponding to the summary information to be evaluated; by generating quality information of the summary information to be evaluated based on evaluation importance data and multiple quality assessment data, the accuracy and reliability of quality assessment are improved; by performing corresponding types of quality assessments on historical dialogue data and summary information to be evaluated in each evaluation dimension, quality assessment data for each dimension are obtained, achieving multi-dimensional quality assessment of the summary information to be evaluated, improving the reliability of quality assessment; by determining the weight of each quality assessment data based on evaluation importance data and weighted fusion, and then generating quality information, the accuracy of quality assessment is improved; and by training a multi-dimensional evaluation model and using it for quality assessment, the consistency of quality assessment is improved.

[0090] It should be noted that, in this document, relational terms such as "first" and "second" are used merely to distinguish one entity or operation from another, and do not necessarily require or imply any such actual relationship or order between these entities or operations. Furthermore, the terms "comprising," "including," or any other variations thereof are intended to cover non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements includes not only those elements but also other elements not expressly listed, or elements inherent to such a process, method, article, or apparatus. Unless otherwise specified, an element defined by the phrase "comprising one..." does not exclude the presence of other identical elements in the process, method, article, or apparatus that includes the element.

[0091] The above are merely specific embodiments of this application, enabling those skilled in the art to understand or implement this application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of this application. Therefore, this application is not to be limited to these embodiments, but is to be accorded the widest scope consistent with the principles and novel features claimed herein.

Claims

1. A quality assessment method, characterized in that, include: Obtain the summary information to be evaluated and the historical dialogue data corresponding to the summary information to be evaluated, wherein the historical dialogue data is the dialogue data between the user and the financial business customer service based on the target financial business; Determine the target business information corresponding to the historical dialogue data, and determine the evaluation importance data corresponding to the summary information to be evaluated in multiple evaluation dimensions based on the target business information; The historical dialogue data and the summary information to be evaluated are evaluated in multiple evaluation dimensions respectively to obtain multiple quality evaluation data corresponding to the summary information to be evaluated. Based on the assessment importance data and the multiple quality assessment data, quality information of the summary information to be assessed is generated.

2. The method according to claim 1, characterized in that, The step of determining the target business information corresponding to the historical dialogue data, and determining the evaluation importance data corresponding to the summary information to be evaluated in multiple evaluation dimensions based on the target business information, includes: Extract target business information from the historical dialogue data, including business content, business type, business compliance information, and business risk level; Based on the target business information, multiple evaluation dimensions are determined for evaluating the summary information to be evaluated; Determine the assessment importance data corresponding to each of the multiple assessment dimensions.

3. The method according to claim 2, characterized in that, Based on the target business information, the determination of multiple evaluation dimensions for evaluating the summary information to be evaluated includes: Based on the business content in the target business information, a first evaluation dimension is determined for evaluating the summary information to be evaluated. A second evaluation dimension is determined based on the business type in the target business information to evaluate the summary information to be evaluated. A third evaluation dimension is determined based on the business compliance information in the target business information to evaluate the summary information to be evaluated; The fourth evaluation dimension for evaluating the summary information to be evaluated is determined based on the business risk level in the target business information.

4. The method according to claim 3, characterized in that, The quality of the historical dialogue data and the summary information to be evaluated is assessed across multiple evaluation dimensions to obtain multiple quality assessment data corresponding to the summary information to be evaluated, including: The consistency of the historical dialogue data and the summary information to be evaluated is evaluated in the first evaluation dimension to obtain the first evaluation data of the summary information to be evaluated. The historical dialogue data and the summary information to be evaluated are evaluated for completeness in the second evaluation dimension to obtain the second evaluation data of the summary information to be evaluated. The historical dialogue data and the summary information to be evaluated are evaluated for compliance in the third evaluation dimension to obtain the third evaluation data of the summary information to be evaluated. The historical dialogue data and the summary information to be evaluated are assessed for risk in the fourth evaluation dimension to obtain the fourth evaluation data of the summary information to be evaluated.

5. The method according to claim 4, characterized in that, The process of generating quality information for the summary information to be evaluated based on the assessment importance data and the multiple quality assessment data includes: Based on the assessment importance data, the weights of the first assessment data, the second assessment data, the third assessment data, and the fourth assessment data are determined; The multiple quality assessment data are weighted and fused according to the weights to obtain the comprehensive assessment data of the dialogue summary; Based on the comprehensive evaluation data, quality information of the summary information to be evaluated is generated.

6. The method according to any one of claims 1-5, characterized in that, The method further includes: Obtain multiple summary information to be evaluated and generate a sample set of summary information to be evaluated; Determine the tag information for each summary information to be evaluated in the sample set of summary information to be evaluated, wherein the tag information includes acceptable tags and unacceptable tags; The training summary information is obtained by filtering the summary information to be evaluated from the sample set of summary information to be evaluated according to the label information. The training summary information and the corresponding label information are input into the initial quality inspection model for model training to obtain a multi-dimensional quality inspection model. The quality of the multiple summary information to be evaluated is assessed based on the multi-dimensional quality inspection model.

7. A quality assessment device, characterized in that, include: The acquisition module is configured to acquire the summary information to be evaluated and the historical dialogue data corresponding to the summary information to be evaluated, wherein the historical dialogue data is the dialogue data between the user and the financial business customer service based on the target financial business. The determination module is configured to determine the target business information corresponding to the historical dialogue data, and to determine the evaluation importance data corresponding to the summary information to be evaluated in multiple evaluation dimensions based on the target business information; The evaluation module is configured to perform quality evaluations on the historical dialogue data and the summary information to be evaluated on the multiple evaluation dimensions respectively, and obtain multiple quality evaluation data corresponding to the summary information to be evaluated. The generation module is configured to generate quality information of the summary information to be evaluated based on the assessment importance data and the plurality of quality assessment data.

8. A computer-readable storage medium having a computer program stored thereon, characterized in that, When the computer program is executed by a processor, it implements the method of any one of claims 1 to 6.

9. An electronic device, comprising a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, characterized in that, When the processor executes the computer program, it implements the method of any one of claims 1 to 6.

10. A computer program product, the computer program product comprising a computer program, characterized in that, When the computer program is executed by a processor, it implements the method as described in any one of claims 1 to 6.