An agent evaluation method and device, electronic equipment and storage medium

By using general and domain-specific large models to conduct multi-dimensional reviews of reports generated by intelligent agents, the problem of unreliability of existing evaluation methods in professional fields is solved, and an accurate assessment of the report generation capabilities of intelligent agents is achieved.

CN121859944BActive Publication Date: 2026-06-16ZHEJIANG LAB

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
ZHEJIANG LAB
Filing Date
2026-03-18
Publication Date
2026-06-16

AI Technical Summary

Technical Problem

Existing intelligent agent evaluation methods cannot accurately reflect an agent's ability to generate research reports in a specific field, leading to unreliable evaluation results.

Method used

Cross-review using general large models and domain large models is employed. Literature is reviewed in multiple dimensions through general review dimensions and professional review dimensions, generating a first review report and a second review report. The report score is determined by combining the literature and the review report, thereby evaluating the report generation capability of the intelligent agent.

🎯Benefits of technology

It provides an effective and reliable evaluation method that can truly reflect the report generation capabilities of intelligent agents in professional fields, thereby improving the accuracy and applicability of the evaluation results.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN121859944B_ABST
    Figure CN121859944B_ABST
Patent Text Reader

Abstract

The application provides an agent evaluation method and device, electronic equipment and storage medium, and relates to the field of artificial intelligence. Literature is obtained, and a research problem of the literature is input into an agent to be evaluated to obtain a research report to be evaluated. Based on a first prompt word engineering, a general large model is guided to evaluate the literature from a general review dimension to obtain a first review report; a field large model is trained in advance by using a corpus in a research field of the literature, and based on a second prompt word engineering, the field large model is guided to evaluate the literature from a professional review dimension related to the research field to obtain a second review report. Based on the literature and the review report, a report score of the research report is determined; and an evaluation result is determined according to the report score. The general large model and the field large model are used to cross-review the research report from different dimensions, and the research report is scored in multiple dimensions based on the literature and the review result, so that the evaluation result can reflect the report generation capability of the agent in various fields.
Need to check novelty before this filing date? Find Prior Art