A government affair intelligent agent evaluation method and device
By collecting multi-dimensional data and using dynamic quantification models, an evaluation method for government intelligent agents is constructed, which solves the problems of the singularity and nonlinearity of existing evaluation methods and realizes a comprehensive and accurate evaluation and optimization guidance for the capabilities of government intelligent agents.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- CHINA ACADEMY OF INFORMATION & COMM
- Filing Date
- 2026-02-09
- Publication Date
- 2026-06-19
AI Technical Summary
Existing methods for evaluating government intelligence entities lack a systematic and quantifiable evaluation framework, have limited evaluation dimensions, lack quantitative models, cannot handle nonlinear relationships and dynamic changes between indicators, and lack dedicated automated evaluation devices.
This paper presents a method for evaluating government intelligent entities. By collecting data from multiple sources and quantifying indicators, combined with indicator normalization and weight allocation, and based on a dynamic quantification model of differential equations, a comprehensive multi-dimensional evaluation system is constructed, including government knowledge base management, technical capabilities, creation capabilities, and scenario application capabilities, and a comprehensive evaluation report is generated.
It enables a comprehensive and accurate assessment of the capabilities of government intelligent agents, provides dynamic optimization guidance, improves the scientific nature and efficiency of the assessment, and supports the continuous optimization of the platform.
Smart Images

Figure CN122243259A_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of intelligent agent technology, specifically to a method and apparatus for evaluating government intelligent agents. Background Technology
[0002] With the rapid development of artificial intelligence technology, especially breakthroughs in key technologies such as large language models, government intelligent agents, as intelligent interactive and task execution entities directly facing core scenarios such as public services, policy consultation, administrative approval, and urban governance, have seen significant improvements in both the scope and depth of their applications. Leveraging their core capabilities such as natural language understanding, multi-turn dialogue management, accurate knowledge-based question answering, and complex process automation, government intelligent agents have become a key technological support for improving government service efficiency, optimizing government office processes, and innovating urban governance models.
[0003] Currently, the development level of government intelligent agents directly affects the effectiveness of government digital transformation. However, how to scientifically, objectively, and comprehensively evaluate the capabilities and maturity of government intelligent agents has become a key issue restricting their further development. Existing evaluation methods for government intelligent agents mostly remain at the level of functional listing or simple qualitative description, lacking a systematic and quantifiable evaluation system. Specifically, existing technologies have the following shortcomings: First, the evaluation dimensions are too narrow. Most evaluation methods focus on one aspect of the platform's capabilities, such as knowledge base management or dialogue interaction capabilities, failing to construct a comprehensive evaluation framework from multiple dimensions such as knowledge management, technical capabilities, creation efficiency, and scenario adaptability. This results in one-sided evaluation results that cannot fully reflect the actual capabilities of the government intelligence agent.
[0004] Secondly, there is a lack of quantitative models. Existing evaluations mostly rely on subjective expert scoring or simple weighted averages, lacking rigorous mathematical model support. In particular, they cannot effectively handle the nonlinear relationships and dynamic changes between indicators, resulting in insufficient scientific rigor and accuracy of the evaluation results.
[0005] Furthermore, it neglects dynamism and interrelationships. Traditional methods treat each evaluation indicator as static and isolated, failing to consider the evolution trend of platform capabilities over time, as well as the mutual influence and synergistic effects between different capability dimensions. Consequently, they cannot provide dynamic and interconnected guidance for the continuous optimization of the platform.
[0006] Finally, there is no dedicated evaluation device. Currently, there is no dedicated device or system on the market that integrates data acquisition, model calculation, and visualization analysis to achieve automated and efficient assessment of the maturity of government intelligent entities. Summary of the Invention
[0007] The purpose of this invention is to provide a method and apparatus for evaluating government intelligence agents, so as to solve the problems mentioned in the background art.
[0008] To achieve the above objectives, the present invention provides the following technical solution: On the one hand, a method for evaluating government intelligent agents is provided, including the following steps: S1. Multi-source data collection and indicator quantification: Acquire multi-source raw data of the government intelligence entity to be evaluated and perform preliminary quantification; S2. Indicator Data Normalization and Weight Allocation: Normalizing the collected raw indicator scores with different dimensions. Convert into standardized scores that can be used for mathematical operations. And assign appropriate weights to each indicator. ; S3. Dimensional score calculation based on integral aggregation: Calculate the comprehensive score of each dimension to punish imbalances in ability and encourage balance. S4. Overall Maturity Evolution Calculation Based on Differential Equations: Combining the comprehensive scores of each dimension and historical evaluation data, calculate the overall maturity of the government intelligence agent and predict its future development trend. S5. Evaluation Result Generation and Output: Generate a comprehensive evaluation report including total score, dimensional radar chart, weakness analysis, and optimization suggestions.
[0009] Furthermore, in step S1, raw data corresponding to the four evaluation dimensions is systematically collected from the management backend, API interfaces, runtime logs, knowledge base content, and simulation test environment of the government intelligent agent development and application platform to be evaluated through methods such as interface calls, database queries, log analysis, simulation testing, and content review. The collected raw data is then cleaned, deduplicated, and formatted (data cleaning and formatting are achieved by writing scripts or using ETL tools) to form a structured raw evaluation dataset. The collection methods include: API calls: Call the management API and runtime API provided by the platform to obtain structured configuration information and performance data; Database query: With authorization, directly query the platform's backend database to obtain detailed records such as knowledge base, user dialogues, and application releases; Log analysis: Collect and parse system logs, application logs, and error logs to analyze stability, call chains, and abnormal situations; Simulation testing: Write and execute automated test cases to simulate the interaction between real users and government intelligent agents in order to obtain quantitative data on functional correctness, performance indicators and user experience; Content review: Sample reviews of the platform's built-in knowledge base documents, Prompt templates, sample libraries, and other content to assess their quality and compliance.
[0010] Furthermore, the four evaluation dimensions are: government knowledge base management, government intelligent agent technology capabilities, government intelligent agent creation capabilities, and government intelligent agent scenario application capabilities. The detailed indicators for each dimension are as follows: The sub-indicators for the management of the government knowledge base are: support for government knowledge documents in multiple formats, completeness of government knowledge metadata definition, unified construction and management capability of the government knowledge base, intelligent document parsing capability of government documents, flexibility of government document slicing strategy, support for structured government data, government data management and verification capability, diversity of government knowledge retrieval modes, configurability of government knowledge retrieval parameters, and question-and-answer capability of government database. The sub-indicators of the government intelligent agent's technical capabilities are: language understanding accuracy, multi-turn dialogue consistency, dialect and terminology adaptability, multimodal interaction capability, task decomposition rationality, thought chain interpretability, self-reflection and optimization capability, tool call success rate, multi-agent collaboration efficiency, and long context memory capability. The sub-indicators of the government intelligence agent creation capability are: ease of government intelligence agent creation, richness and quality of government intelligence agent template library, flexibility of government intelligence agent workflow orchestration, support for government scenario Prompt project, government intelligence agent RAG configuration and management capability, government intelligence agent memory and dialogue configuration, integration of government intelligence agent tools and MCP, observability of government intelligence agent application, automated evaluation capability of government intelligence agent, and support for multi-channel release of government intelligence agent. The sub-indicators of the application capabilities of the government intelligent body scenario are: accuracy of government Q&A, ability to select and guide scenarios, intelligent pre-review and form filling, process transparency and notification, intelligent document writing assistance, intelligent meeting support, event reporting and recognition, intelligent video analysis and reporting, intelligent event allocation, and data analysis and reporting.
[0011] Furthermore, step S2 includes the following sub-steps: S21. Indicator Data Normalization: Based on the original indicators... Different types (continuous numerical type, ordered hierarchical type) are mapped to the [0,1] interval using different normalization functions to obtain standardized scores. ,in This indicates that the indicator performed the worst. This indicates that the indicator has reached its optimal state; S22. Weight Allocation: A combination of subjective and objective methods is used. Subjective weights are determined through the Analytic Hierarchy Process (AHP), and objective weights are determined through the entropy weight method, thus obtaining the comprehensive weight. This indicates the relative importance of the sub-indicator within its respective dimension, and the weights of all sub-indicators satisfy the following: (For each dimension m); I. Determining Subjective Weights (Based on AHP): Several experts in government informatization and artificial intelligence were invited to conduct pairwise comparisons of sub-indicators under each dimension to construct a judgment matrix; By calculating the eigenvectors of the matrix, a set of subjective weights reflecting the will of the expert group is obtained. ; The final weights are obtained by calculating the eigenvectors of the judgment matrix. It is a decimal between 0 and 1, and the sum of all sub-indicators within the same dimension. The sum is 1; II. Determination of Objective Weights (Based on Entropy Weight Method): Collect standardized scores from multiple evaluated platforms on this sub-metric. This forms a data sequence; The weights are calculated based on the dispersion (entropy) of the data distribution in the sequence; the greater the difference in the index values (the better the discrimination), the higher the entropy weight. The higher the value, the more information it utilizes; The calculation steps are as follows: a. Calculate the coefficient of variation of the score within the dimension. For the m dimensions in the evaluation items, each dimension has k sub-indicators, and their standardized score set is as follows: ; Calculate the standard deviation of this set. The arithmetic square root of the variance is called the standard deviation. Calculate the average of the set. ; Calculate the coefficients of variation for this dimension. ; The larger the value, the more unbalanced the development of various capabilities within that dimension. b. Calculate the deviation of the index For the nth sub-index within dimension m of the evaluation item, its index deviation... The calculation is as follows: ; This value measures the difference between the sub-metric score and the dimension average. c. Calculate the deviation of the index Objective weight of the nth sub-indicator within dimension m The calculation is as follows: in, This indicator measures the difference between the score of this sub-indicator and the average level of the dimension. The overall balance of this dimension; the more unbalanced a dimension is, the more its weight will be amplified, thus highlighting the impact of this problematic dimension in the overall evaluation. The denominator is a normalization factor, ensuring that the sum of the objective weights within the dimension is 1; III. Overall Weighting: The subjective weights and objective weights are linearly combined to obtain the final overall weighting. ; formula: ; in It is an adjustment factor used to balance expert experience with data objectivity.
[0012] Furthermore, in step S21, the original indicators are divided into two types: continuous numerical indicators and ordered hierarchical indicators. The normalization operation is as follows: For continuous numerical metrics (such as percentages, time, characterized by raw data being continuous numerical values, such as accuracy, time elapsed): set up This is the raw score for a specific sub-indicator. This represents the theoretical minimum score or minimum requirement for this indicator. Theoretically, this indicator represents the full score or optimal value; normalized standard score. The calculation formula is as follows: ; If the target is a negative indicator such as time consumption (the shorter the better), then set This is the raw score for a specific sub-indicator. For the ideal shortest time, To determine the longest acceptable time, the normalization function is: ; For ordinal rank indicators (such as a 1-5 scale or a 0 / 1 / 2 scale, characterized by: the original data being discrete ranks, with higher ranks representing stronger capabilities): set up This is the raw score for a specific sub-indicator. The lowest grade score, For the highest grade score, the normalization function is: .
[0013] Furthermore, step S3 includes the following operations: Input: Receive the standardized scores of all sub-indicators under a certain evaluation dimension m. and their corresponding comprehensive weights ; Kernel density estimation: A Gaussian kernel function is used to smoothly interpolate discrete score points, generating a continuous ability probability density function. This reflects the distribution pattern of this dimension's capabilities in the [0,1] interval; Weight interpolation: Transforming discrete weights into continuous weight functions through linear or spline interpolation. ; Output: Treat the sub-index scores under each dimension as a continuous distribution, and aggregate them into a comprehensive score for that dimension through integration. , ; ; in: : composed of discrete data points ( , How important is the level x in the function constructed through interpolation? :Depend on A function constructed through a process called kernel density estimation determines how much capability the platform has at level x; this is determined by the scores of all sub-indicators within the dimension. As a seed, a continuous and smooth probability density curve grows from the mathematical process of kernel density estimation; it intuitively reveals the overall distribution of the platform's capabilities in this dimension, serving as an X-ray for diagnosing its developmental balance. Multiply the stock of capabilities at all capability levels by the value of that capability, and then sum them up. This calculates the expected value output of the platform. Regularization term Only care about Whether this capability map is flat (balanced) or rugged (unbalanced), the regularization term further increases the weight of the low-lying areas, resulting in more points being deducted during scoring. This is equivalent to amplifying the penalty of the regularization term, more accurately reflecting the harm of weaknesses.
[0014] Furthermore, step S4 includes the following sub-steps: S41. Treat the total maturity as a dynamic variable that evolves over time and construct a non-homogeneous linear differential equation; S42. Read the dimension score sequences of the current and historical evaluation periods, and solve the differential equations by combining the preset global dimension weights and adjustment coefficients.
[0015] Furthermore, the non-homogeneous linear differential equation: ; in: It is the platform's total maturity score at time t; It is the derivative of M(t), representing the instantaneous rate of change of platform maturity; The platform's capability potential / goal maturity; It is the global weight of the m-th dimension, satisfying Obtained based on the s2 step method; M(t) (on the right side of the equation): represents the current maturity level of the platform, which constitutes the inertia that hinders change.
[0016] Furthermore, in step S5, the final evaluation results are output in the form of visual charts and structured documents through a graphical interface or file interface.
[0017] On the other hand, a government intelligence agent evaluation device is provided, which is applied to the government intelligence agent evaluation method described above, including: a government intelligence agent platform, network infrastructure, evaluation system hardware cluster and external system; the government intelligence agent platform is the evaluated object, specifically in the form of an application system such as intelligent question answering and process handling deployed in a government cloud or private environment, including a government intelligence agent server, platform database, API gateway, log server and knowledge base storage; The network infrastructure includes core switches, firewalls / gatekeepers, load balancers, and VPN gateways, which are responsible for establishing a secure and reliable data transmission channel between the government intelligent body platform and the evaluation system hardware cluster; The evaluation system hardware cluster is the core hardware for executing the method of the present invention, and its internal structure is further divided into a data acquisition layer, a computing and storage layer and a presentation layer. The external systems include cloud storage / backup and monitoring systems, which provide operational support for the evaluation process; The evaluation system hardware cluster can be centrally deployed in the same data center or distributed across different security domains according to government security requirements.
[0018] This invention provides a method and apparatus for evaluating government intelligent agents, which has the following beneficial effects: 1. A comprehensive multi-dimensional evaluation system has been constructed: By systematically integrating four core dimensions—government knowledge base management, government intelligent agent technology capabilities, government intelligent agent creation capabilities, and government intelligent agent scenario application capabilities—the system solves the problems of single dimensions and limited perspective in existing evaluation methods.
[0019] 2. A dynamic quantification model based on calculus was introduced: By constructing a complex mathematical model that includes integral and differential equations, subjective and discrete evaluation indicators are transformed into objective and continuous maturity scores. This solves the problems of traditional methods lacking scientific quantitative basis and being unable to handle nonlinear dynamic relationships, and significantly improves the accuracy and depth of the evaluation.
[0020] 3. Achieved process-oriented and automated evaluation: Provided a complete process from data collection and indicator calculation to result output, and designed corresponding special equipment, which solved the problems of reliance on manual labor, low efficiency and poor consistency in the evaluation process, and realized the repeatability, traceability and efficient execution of the evaluation process.
[0021] 4. Provides precise optimization guidance: The model calculation can not only obtain the total score, but also reveal the strengths and weaknesses of each dimension and sub-dimension, providing clear and quantitative optimization directions and decision support for the platform's iterative upgrades. Attached Figure Description
[0022] Figure 1 This is a schematic diagram of the first evaluation dimension of the government intelligent agent evaluation method of the present invention; Figure 2 This is a schematic diagram of the second evaluation dimension of the government intelligent agent evaluation method of the present invention; Figure 3 This is a schematic diagram of the third evaluation dimension of the government intelligent agent evaluation method of the present invention; Figure 4 This is a schematic diagram of the fourth evaluation dimension of the government intelligent agent evaluation method of the present invention; Figure 5 This is a schematic diagram illustrating the scaling method of the government intelligence agent evaluation method of the present invention; Figure 6 This is a schematic diagram of the hardware structure of a government affairs intelligent body evaluation device according to the present invention; Figure 7 This is a flowchart of a government affairs intelligent agent evaluation method according to the present invention; Figure 8 This is a flowchart illustrating the steps of a government affairs intelligent agent evaluation method according to the present invention. Detailed Implementation
[0023] The embodiments of the present invention will be described in further detail below with reference to the accompanying drawings and examples. The following examples are for illustrative purposes only and should not be construed as limiting the scope of the invention.
[0024] like Figures 1-8 As shown, a method for evaluating government intelligence agencies includes the following steps: S1. Multi-source data collection and indicator quantification: Obtain multi-source raw data of the government intelligence entity to be evaluated and perform preliminary quantification.
[0025] In this step, raw data corresponding to the four evaluation dimensions is systematically collected from the management backend, API interface, runtime log, knowledge base content, and simulation test environment of the government intelligent body development and application platform to be evaluated through methods such as interface calls, database queries, log analysis, simulation testing, and content review. The collected raw data is then cleaned, deduplicated, and formatted (data cleaning and formatting are achieved by writing scripts or using ETL tools) to form a structured raw evaluation dataset.
[0026] In this embodiment, the data collection method includes: API calls: Call the management API and runtime API provided by the platform to obtain structured configuration information and performance data.
[0027] Database Query: With authorization, directly query the platform's backend database to obtain detailed records such as knowledge base, user conversations, and application releases.
[0028] Log analysis: Collects and parses system logs, application logs, and error logs to analyze stability, call chains, and abnormal situations.
[0029] Simulation Testing: Write and execute automated test cases to simulate real user interactions with government intelligence agents in order to obtain quantitative data on functional correctness, performance metrics, and user experience.
[0030] Content review: Sample reviews of the platform's built-in knowledge base documents, Prompt templates, sample libraries, and other content to assess their quality and compliance.
[0031] The four evaluation dimensions are: government knowledge base management, government intelligent agent technology capabilities, government intelligent agent creation capabilities, and government intelligent agent scenario application capabilities. (See below) Figure 1 The detailed indicators for each dimension are as follows: Sub-indicators for government knowledge base management include: support for multi-format government knowledge documents, completeness of government knowledge metadata definition, unified construction and management capabilities of government knowledge base, intelligent document parsing capabilities, flexibility of government document slicing strategies, support for structured government data, government data management and verification capabilities, diversity of government knowledge retrieval modes, configurability of government knowledge retrieval parameters, and government database question-and-answer capabilities.
[0032] Sub-indicators of the technical capabilities of government intelligent agents include: language understanding accuracy, consistency in multi-turn dialogue, adaptability to dialects and terminology, multimodal interaction capability, reasonableness of task decomposition, interpretability of thought chain, self-reflection and optimization capability, success rate of tool invocation, efficiency of multi-agent collaboration, and long contextual memory capability.
[0033] Sub-indicators of the ability to create government intelligence agents: ease of creating government intelligence agents, richness and quality of government intelligence agent template library, flexibility of government intelligence agent workflow orchestration, support for government scenario Prompt projects, government intelligence agent RAG configuration and management capabilities, government intelligence agent memory and dialogue configuration, integration of government intelligence agent tools and MCP, observability of government intelligence agent applications, automated evaluation capabilities of government intelligence agents, and support for multi-channel release of government intelligence agents.
[0034] Sub-indicators of the application capabilities of government intelligent agents in various scenarios include: accuracy of government Q&A, ability to select and guide scenarios, intelligent pre-review and form filling, process transparency and notification, intelligent document writing assistance, intelligent meeting support, event reporting and recognition, intelligent video analysis and reporting, intelligent event allocation, and data analysis and reporting.
[0035] S2. Indicator Data Normalization and Weight Allocation: Normalizing the collected raw indicator scores with different dimensions. Convert into standardized scores that can be used for mathematical operations. And assign appropriate weights to each indicator. .
[0036] S21. Indicator Data Normalization: Based on the original indicators... Different types (continuous numerical type, ordered hierarchical type) are mapped to the [0,1] interval using different normalization functions to obtain standardized scores. ,in This indicates that the indicator performed the worst. This indicates that the indicator has reached its optimal state; For continuous numerical metrics (such as percentages, time, characterized by raw data being continuous numerical values, such as accuracy, time elapsed): set up This is the raw score for a specific sub-indicator. This represents the theoretical minimum score or minimum requirement for this indicator. Theoretically, this indicator represents the full score or optimal value; normalized standard score. The calculation formula is as follows: ; If the target is a negative indicator such as time consumption (the shorter the better), then set This is the raw score for a specific sub-indicator. For the ideal shortest time, To determine the longest acceptable time, the normalization function is: .
[0037] For ordinal rank indicators (such as a 1-5 scale or a 0 / 1 / 2 scale, characterized by: the original data being discrete ranks, with higher ranks representing stronger capabilities): set up This is the raw score for a specific sub-indicator. The lowest grade score, For the highest grade score, the normalization function is: .
[0038] S22. Weight Allocation: A combined subjective and objective approach is adopted. Subjective weights are determined using the Analytic Hierarchy Process (AHP), and objective weights are determined using the entropy weight method, thus obtaining the comprehensive weight. This indicates the relative importance of the sub-indicator within its respective dimension, and the weights of all sub-indicators satisfy the following: (For each dimension m).
[0039] I. Determining Subjective Weights (Based on AHP): Several experts in e-government and artificial intelligence were invited to conduct pairwise comparisons of sub-indicators under each dimension to construct a judgment matrix; for example, in e-government knowledge base management (D1), experts might consider intelligent document parsing capability (… ) compared to metadata completeness ( It is clearly more important.
[0040] By calculating the eigenvectors of the matrix, a set of subjective weights reflecting the will of the expert group is obtained. .
[0041] The AHP method does not directly assign a weighted score to each indicator. Instead, it indirectly calculates the weights by constructing a judgment matrix. Experts are required to perform pairwise comparisons of sub-indicators within the same dimension, using a 1-9 scale. The meaning of this scale is as follows: Figure 2 As shown.
[0042] The final weights are obtained by calculating the eigenvectors of the judgment matrix. It is a decimal between 0 and 1, and the sum of all sub-indicators within the same dimension. The sum is 1.
[0043] II. Determination of Objective Weights (Based on Entropy Weight Method): Collect standardized scores from multiple evaluated platforms on this sub-metric. This forms a data sequence; The weights are calculated based on the dispersion (entropy) of the data distribution in the sequence; the greater the difference in the index values (the better the discrimination), the higher the entropy weight. The higher the value, the more information it utilizes.
[0044] The core idea here is that within a platform, if the scores of various sub-indicators under a certain dimension are highly uneven, it indicates a serious weakness in the platform's capabilities in that dimension; conversely, it indicates balanced development. To guide the platform's overall development, weaker indicators within unbalanced dimensions should be given higher objective weighting to highlight their importance in the overall score and guide the platform to address its shortcomings.
[0045] The calculation steps are as follows: a. Calculate the coefficient of variation of the score within the dimension. For the m dimensions in the evaluation items, each dimension has k sub-indicators, and their standardized score set is as follows: ; Calculate the standard deviation of this set. The arithmetic square root of the variance is called the standard deviation. Calculate the average of the set. ; Calculate the coefficients of variation for this dimension. ; The larger the value, the more unbalanced the development of various capabilities within that dimension. b. Calculate the deviation of the index For the nth sub-index within dimension m of the evaluation item, its index deviation... The calculation is as follows: ; This value measures the difference between the sub-metric score and the dimension average. c. Calculate the deviation of the index Objective weight of the nth sub-indicator within dimension m The calculation is as follows: in, This indicator measures the difference between the score of this sub-indicator and the average level of the dimension. The overall balance of this dimension; the more unbalanced a dimension is, the more its weight will be amplified, thus highlighting the impact of this problematic dimension in the overall evaluation. The denominator is a normalization factor, ensuring that the sum of the objective weights within the dimension is 1; III. Overall Weighting: The subjective weights and objective weights are linearly combined to obtain the final overall weighting. The comprehensive weighting better balances strategic importance and the platform's own shortcomings, perfectly fitting the context of a single platform's self-assessment. formula: ; in It is an adjustment coefficient used to balance expert experience and data objectivity. In the field of government affairs, it is appropriate to favor expert experience (e.g., by setting...). ).
[0046] Through steps S1 and S2, we obtained: Standardized score matrix of all sub-indicators Each element ; Weight matrix of all sub-indicators Each dimension m satisfies .
[0047] S3. Dimensional score calculation based on integral aggregation: Calculate the comprehensive score of each dimension to punish imbalances in ability and encourage balance.
[0048] Input: Receive the standardized scores of all sub-indicators under a certain evaluation dimension m. and their corresponding comprehensive weights .
[0049] Kernel density estimation: A Gaussian kernel function is used to smoothly interpolate discrete score points, generating a continuous ability probability density function. This reflects the distribution pattern of the capability of this dimension in the [0,1] interval.
[0050] Weight interpolation: Transforming discrete weights into continuous weight functions through linear or spline interpolation. .
[0051] Output: Treat the sub-index scores under each dimension as a continuous distribution, and aggregate them into a comprehensive score for that dimension through integration. , .
[0052] ; in: : composed of discrete data points ( , How important is the level x in the function constructed by interpolation?
[0053] :Depend on A function constructed through a process called kernel density estimation determines how much capability the platform has at level x. This is derived from the scores of all sub-indicators within the dimension. As a seed, a continuous and smooth probability density curve grows from the mathematical process of kernel density estimation; it intuitively reveals the overall distribution of the platform's capabilities in this dimension, serving as an X-ray for diagnosing its developmental balance.
[0054] The expected value output of a platform is calculated by multiplying the stock of capabilities at all capability levels by the value of that capability and then summing the results.
[0055] Regularization term Only care about Whether this capability map is flat (balanced) or rugged (unbalanced), the regularization term further increases the weight of the low-lying areas, resulting in more points being deducted during scoring. This is equivalent to amplifying the penalty of the regularization term, more accurately reflecting the harm of weaknesses.
[0056] S4. Overall Maturity Evolution Calculation Based on Differential Equations: Combining the comprehensive scores of each dimension and historical evaluation data, calculate the overall maturity of the government intelligence agent and predict its future development trend. S41. Treating total maturity as a dynamic variable that evolves over time, construct a non-homogeneous linear differential equation: ; in: It is the platform's total maturity score at time t; It is the derivative of M(t), representing the instantaneous rate of change of platform maturity; The platform's capability potential / goal maturity; It is the global weight of the m-th dimension, satisfying Obtained based on the s2 step method; M(t) (on the right side of the equation): represents the current maturity level of the platform, which constitutes the inertia that hinders change.
[0057] In this embodiment, Maturity gap measures the difference between a platform's potential and its current state.
[0058] If the gap is significant (potential >> current status), it indicates that the platform's capabilities are strong, but its overall maturity has not kept pace. Therefore, the rate of change... It will be very large, and the platform will mature rapidly.
[0059] If the gap is small (potential ≈ current status), it indicates that the platform's development has stabilized and the rate of change is low. It will be very small.
[0060] If the gap is negative (potential < current status), it indicates that the platform's capabilities have declined, but its maturity has not yet fully decreased due to inertia. If the value is negative, maturity will begin to decline.
[0061] γ is the maturity adjustment coefficient (>0), which reflects the platform's maturity response speed to changes in external capabilities.
[0062] A large γ value indicates that the platform responds quickly to changes in capabilities, and its maturity can keep up with these changes rapidly.
[0063] A small γ value indicates that the platform system is rigid and has great inertia. Even if the capabilities are improved, the overall maturity will take a long time to grow slowly.
[0064] γ is a system parameter that needs to be calibrated. It is not calculated directly by a formula, but can be obtained through fitting historical data, estimating based on expert experience, sensitivity analysis, and trial calculations.
[0065] S42. Read the dimension score sequences of the current and historical evaluation periods, and solve the differential equations by combining the preset global dimension weights and adjustment coefficients.
[0066] Given initial conditions (This can be obtained through a weighted average of the initial assessments). Solving this differential equation yields the total maturity score at a future assessment time T: ; in: It is the platform's initial maturity level (such as the score during the first evaluation six months ago). The residual effect representing the initial maturity level will decrease over time T. The rate of decay decreases; the longer the time, the less of the accumulated resources are used up. This represents the platform's various capabilities over a period of time [0,T]. Cumulative contribution to overall maturity; Internal integral It calculates the contribution of potential energy at each instant from time 0 to time T, and multiplies it by a factor. amplification factor; external This involves scaling and attenuating this cumulative contribution; The ultimate maturity of a platform cannot be completely divorced from its history, nor can it ignore the capability enhancements brought about by recent efforts. Solving differential equations is essentially about accurately calculating the ratio between these two factors.
[0067] The solution process is as follows: ; ; Find the integrating factor: ; Multiply both sides by the integrating factor: ; According to the inverse operation of the derivative multiplication rule, (f*g)'=f'*g+g'*f; ; Integrating both sides with respect to time t (from the initial time 0 to the current time T): ; The integral on the left can be calculated directly: ; Right now: , where M_0 = M(0) is the initial maturity; Rearrange the equations and solve for M(T): ; Divide both sides by That is, multiplied by : .
[0068] S5. Evaluation Result Generation and Output: Generate a comprehensive evaluation report including total score, dimension radar chart, weakness analysis and optimization suggestions. Output the final evaluation results in the form of visual charts and structured documents through a graphical interface or file interface.
[0069] A government intelligence agent evaluation device, applied to the government intelligence agent evaluation method described above, includes: a government intelligence agent platform 100, network infrastructure 200, an evaluation system hardware cluster 300, and an external system 400. The government intelligence agent platform 100, as the evaluated object, specifically takes the form of an application system such as intelligent question answering and process handling deployed in a government cloud or private environment, and includes a government intelligence agent server 101, a platform database 102, an API gateway 103, a log server 104, and a knowledge base storage 105. The network infrastructure 200 includes a core switch 201, a firewall / gateway 202, a load balancer 203, and a VPN gateway 204, responsible for establishing a secure and reliable data transmission channel between the government intelligence agent platform 100 and the evaluation system hardware cluster 300. The evaluation system hardware cluster 300 is the core hardware for executing the method of this invention, and its internal structure is further divided into a data acquisition layer 310, a computing and storage layer 320, and a presentation layer 330. External system 400 includes cloud storage / backup 401 and monitoring system 402, providing operational support for the evaluation process. The evaluation system hardware cluster 300 can be centrally deployed in the same data center or distributed across different security domains according to government security requirements; this embodiment of the invention does not impose specific limitations on this.
[0070] The data acquisition servers (311, 312) in the data acquisition layer 310 call the management interface and operation interface from the API gateway 103 of the government affairs intelligent body platform 100 through the secure channel of the network infrastructure 200 (via firewall 202 and load balancer 203), query structured records from the platform database 102, obtain operation logs from the log server 104, sample document content from the knowledge base storage 105, and systematically collect raw data corresponding to 40 sub-indicators under the four dimensions of "government affairs knowledge management", "technical capability", "creation capability" and "scenario application capability" by executing automated test scripts to simulate user interaction.
[0071] The acquisition servers (311, 312) clean, deduplicatize, and format the acquired raw data to form a structured raw evaluation dataset, which is then transmitted to the main database 323 of the computing and storage layer 320 for persistent storage via the internal network.
[0072] The scoring calculation module in the computing and storage layer 320 (running on the computing server 321) reads the raw dataset from the main database 323 and converts each piece of raw data into an initial score for the corresponding sub-indicator according to preset quantification rules (such as accuracy calculation, expert scoring, and mixed scoring). Next, based on the type of each sub-indicator (continuous percentage, ordered rank), the corresponding normalization function is applied to... Mapping to the [0,1] interval yields the standardized score. .
[0073] The scoring calculation module 321 uses a weighting method that combines subjective and objective factors to determine the comprehensive weight of each sub-indicator. The determination of subjective weights involved inviting experts in e-government and artificial intelligence through expert review terminal 333. The Analytic Hierarchy Process (AHP) was used to compare sub-indicators within the same dimension pairwise, constructing a judgment matrix and calculating eigenvectors. The objective weights are calculated using the entropy weighting method, based on the dispersion of the standardized scores from multiple evaluated platforms on this indicator. Ultimately, the overall weighting is determined. , where α is the balance coefficient, which can be taken as 0.6 based on experience in the field of government affairs.
[0074] The integral / differential engine of the computation and storage layer 320 (running on compute server 322) calculates the standardized scores of all its sub-indices for each dimension m. and corresponding weights are inputs First, a Gaussian kernel function is used to estimate the kernel density, generating the probability density distribution curve of the capability in this dimension. Simultaneously, discrete weights are transformed into continuous weight functions through linear interpolation. Then, perform numerical integration. The comprehensive scores of each dimension after penalizing the "uneven performance" phenomenon were calculated. Numerical integration is achieved using the adaptive Simpson method.
[0075] The integral / differential engine 322 further implements the solution of differential equations. It reads the dimensional score sequences for the current and historical evaluation periods (if any). Combined with preset global weights of dimensions (satisfy Using the adjustment coefficient γ, solve the differential equation. Given an initial maturity level M_0 (which can be obtained through a weighted average of the initial assessments), the total maturity score at the current time T is calculated. And obtain its evolutionary trajectory curve.
[0076] The web application server 331 in the presentation layer 330 obtains the calculated data from the main database 323 and the cache database 324 via the internal network. , And detailed intermediate data. Automatically generate a comprehensive evaluation report containing the following: 1) Total maturity score. The report includes: 1) Historical trend comparison chart; 2) Radar charts of capabilities across four dimensions; 3) A list of specific weaknesses (sorted by deviation from the average level); and 4) Targeted optimization suggestions (based on weakness analysis). The report is available in PDF, Word, and web page formats.
[0077] The management client 332 and the expert review terminal 333 access the web application server 331 through a browser or a dedicated client to view and download evaluation reports. Experts can also score indicators that require subjective human evaluation (such as metadata completeness I_12) through the terminal 333. This scoring data will be transmitted back to the scoring calculation module 321 through a secure network for updating weight calculations.
[0078] This embodiment evaluates government intelligence entities based on multi-source data collection and multi-dimensional indicator quantification. Covering four core dimensions—government knowledge base management, technical capabilities, creation capabilities, and scenario applications—and ensuring data comprehensiveness and objectivity through automated collection and standardized quantification, the evaluation results are more aligned with actual government application needs. The combined subjective and objective weight allocation method considers both expert strategic judgment and highlights the platform's actual shortcomings, making weight allocation more scientific. The integral aggregation algorithm precisely penalizes capability imbalances, aligning with the core requirement of "no weaknesses for reliability" in government systems, significantly improving the accuracy of dimension scoring. The use of a differential equation evolution model to dynamically calculate overall maturity reflects both current capability levels and historical accumulation and future evolution trends, providing forward-looking guidance for iterative optimization. The final generated visualized comprehensive report and structured optimization suggestions further lower the threshold for the practical application of the evaluation results. Overall, the evaluation demonstrates superior scientific rigor, accuracy, and practicality, resulting in excellent effectiveness.
[0079] The embodiments of the present invention are given for illustrative and descriptive purposes only, and are not intended to be exhaustive or to limit the invention to the forms disclosed. Many modifications and variations will be apparent to those skilled in the art. The embodiments were chosen and described in order to better illustrate the principles and practical application of the invention, and to enable those skilled in the art to understand the invention and to design various embodiments with various modifications suitable for a particular purpose.
Claims
1. A method for evaluating government intelligent agents, characterized in that, Includes the following steps: S1. Multi-source data collection and indicator quantification: Acquire multi-source raw data of the government intelligence entity to be evaluated and perform preliminary quantification; S2. Indicator Data Normalization and Weight Allocation: Normalizing the collected raw indicator scores with different dimensions. Converted into standardized scores that can be used for mathematical operations. And assign appropriate weights to each indicator. ; S3. Dimensional score calculation based on integral aggregation: Calculate the comprehensive score of each dimension to punish imbalances in ability and encourage balance. S4. Overall Maturity Evolution Calculation Based on Differential Equations: Combining the comprehensive scores of each dimension and historical evaluation data, calculate the overall maturity of the government intelligence agent and predict its future development trend. S5. Evaluation Result Generation and Output: Generate a comprehensive evaluation report including total score, dimensional radar chart, weakness analysis, and optimization suggestions.
2. The method and apparatus for evaluating a government intelligent agent according to claim 1, characterized in that, In step S1, raw data corresponding to the four evaluation dimensions is collected from the management backend, API interfaces, runtime logs, knowledge base content, and simulation test environment of the government intelligent agent development and application platform to be evaluated through interface calls, database queries, log analysis, simulation testing, and content review. The collected raw data is then cleaned, deduplicated, and formatted to form a structured raw evaluation dataset. The collection methods include: API calls: Call the management API and runtime API provided by the platform to obtain structured configuration information and performance data; Database query: With authorization, directly query the platform's backend database to obtain knowledge base, user conversations, and application release records; Log analysis: Collect and parse system logs, application logs, and error logs to analyze stability, call chains, and abnormal situations; Simulation testing: Write and execute automated test cases to simulate the interaction between real users and government intelligent agents in order to obtain quantitative data on functional correctness, performance indicators and user experience; Content review: Sample reviews of the platform's built-in knowledge base documents, Prompt templates, sample libraries, and other content to assess their quality and compliance.
3. The method and apparatus for evaluating a government intelligent agent according to claim 3, characterized in that, The four evaluation dimensions are: government knowledge base management, government intelligent agent technology capabilities, government intelligent agent creation capabilities, and government intelligent agent scenario application capabilities. The detailed indicators for each dimension are as follows: The sub-indicators for the management of the government knowledge base are: support for government knowledge documents in multiple formats, completeness of government knowledge metadata definition, unified construction and management capability of the government knowledge base, intelligent document parsing capability of government documents, flexibility of government document slicing strategy, support for structured government data, government data management and verification capability, diversity of government knowledge retrieval modes, configurability of government knowledge retrieval parameters, and question-and-answer capability of government database. The sub-indicators of the government intelligent agent's technical capabilities are: language understanding accuracy, multi-turn dialogue consistency, dialect and terminology adaptability, multimodal interaction capability, task decomposition rationality, thought chain interpretability, self-reflection and optimization capability, tool call success rate, multi-agent collaboration efficiency, and long context memory capability. The sub-indicators of the government intelligence agent creation capability are: ease of government intelligence agent creation, richness and quality of government intelligence agent template library, flexibility of government intelligence agent workflow orchestration, support for government scenario Prompt project, government intelligence agent RAG configuration and management capability, government intelligence agent memory and dialogue configuration, integration of government intelligence agent tools and MCP, observability of government intelligence agent application, automated evaluation capability of government intelligence agent, and support for multi-channel release of government intelligence agent. The sub-indicators of the application capabilities of the government intelligent body scenario are: accuracy of government Q&A, ability to select and guide scenarios, intelligent pre-review and form filling, process transparency and notification, intelligent document writing assistance, intelligent meeting support, event reporting and recognition, intelligent video analysis and reporting, intelligent event allocation, and data analysis and reporting.
4. The method and apparatus for evaluating a government intelligent agent according to claim 1, characterized in that, Step S2 includes the following sub-steps: S21. Indicator Data Normalization: Based on the original indicators... Different types of data are mapped to the [0,1] interval using different normalization functions to obtain standardized scores. ,in This indicates that the indicator performed the worst. This indicates that the indicator has reached its optimal state; S22. Weight Allocation: A combination of subjective and objective methods is adopted. Subjective weights are determined by the Analytic Hierarchy Process (AHP), and objective weights are determined by the entropy weight method, thus obtaining the comprehensive weight. Weight This indicates the relative importance of the sub-indicator within its respective dimension, and the weights of all sub-indicators satisfy the following: .
5. The method and apparatus for evaluating a government intelligent agent according to claim 4, characterized in that, In step S21, the original indicators are divided into two types: continuous numerical indicators and ordered hierarchical indicators. The normalization operation is as follows: For continuous numerical indicators: set up This is the raw score for a specific sub-indicator. This represents the theoretical minimum score or minimum requirement for this indicator. Theoretically, this indicator represents the full score or optimal value; normalized standard score. The calculation formula is as follows: ; If we are targeting the time consumption of reverse indicators, let's set... This is the raw score for a specific sub-indicator. For the ideal shortest time, To determine the longest acceptable time, the normalization function is: ; For ordinal hierarchical indicators: set up This is the raw score for a specific sub-indicator. The lowest grade score, For the highest grade score, the normalization function is: 。 6. The method and apparatus for evaluating a government intelligent agent according to claim 1, characterized in that, Step S3 includes the following operations: Input: Receive the standardized scores of all sub-indicators under a certain evaluation dimension m. and their corresponding comprehensive weights ; Kernel density estimation: A Gaussian kernel function is used to smoothly interpolate discrete score points, generating a continuous ability probability density function. This reflects the distribution pattern of this dimension's capabilities in the [0,1] interval; Weight interpolation: Transforming discrete weights into continuous weight functions through linear or spline interpolation. ; Output: Treat the sub-indicator scores under each dimension as a continuous distribution, and aggregate them into a comprehensive score for that dimension through integration. , ; ; in: : composed of discrete data points ( , How important is the level x in the function constructed through interpolation? :Depend on A function constructed through a process called kernel density estimation determines how much capacity the platform has at level x. Multiply the stock of capabilities at all capability levels by the value of that capability, and then sum them up. This calculates the expected value output of the platform. Regularization term Only care about Is it balanced? 7. The method and apparatus for evaluating a government intelligent agent according to claim 1, characterized in that, Step S4 includes the following sub-steps: S41. Treat the total maturity as a dynamic variable that evolves over time and construct a non-homogeneous linear differential equation; S42. Read the dimension score sequences of the current and historical evaluation periods, and solve the differential equations by combining the preset global dimension weights and adjustment coefficients.
8. The method and apparatus for evaluating a government intelligent agent according to claim 7, characterized in that, The nonhomogeneous linear differential equation: ; in: It is the platform's total maturity score at time t; It is the derivative of M(t), representing the instantaneous rate of change of platform maturity; The platform's capability potential / goal maturity; It is the global weight of the m-th dimension, satisfying ; M(t): Represents the current maturity level of the platform.
9. The method and apparatus for evaluating a government intelligent agent according to claim 1, characterized in that, In step S5, the final evaluation results are output in the form of visual charts and structured documents through a graphical interface or file interface.
10. A government intelligence agent evaluation device, applied to the government intelligence agent evaluation method as described in any one of claims 1-9, characterized in that, include: The system includes a government intelligent agent platform (100), network infrastructure (200), evaluation system hardware cluster (300), and external systems (400). The government intelligent agent platform (100) is the object to be evaluated and is specifically an application system deployed in a government cloud or private environment, including a government intelligent agent server (101), platform database (102), API gateway (103), log server (104), and knowledge base storage (105). The network infrastructure (200) includes a core switch (201), a firewall / gateway (202), a load balancer (203), and a VPN gateway (204), which is responsible for establishing a secure and reliable data transmission channel between the government intelligent agent platform (100) and the evaluation system hardware cluster (300); The evaluation system hardware cluster (300) is further divided into a data acquisition layer (310), a computing and storage layer (320), and a presentation layer (330). The external system (400) includes cloud storage / backup (401) and monitoring system (402) to provide operational support for the evaluation process.