Document review generation method, apparatus, and electronic device

What is AI technical title?
AI technical title is built by PatSnap AI team. It summarizes the technical point description of the patent document.
By identifying the initial prompt information and utilizing the review generation model and document knowledge base, structured reviews are automatically generated, solving the problem of information extraction from massive documents, achieving efficient and accurate review generation, and reducing manual processing costs.

CN122197830APending Publication Date: 2026-06-12CHINA TOBACCO GUANGXI IND

View PDF 0 Cites 0 Cited by

Patent Information

Authority / Receiving Office: CN · China
Patent Type: Applications(China)
Current Assignee / Owner: CHINA TOBACCO GUANGXI IND
Filing Date: 2026-03-09
Publication Date: 2026-06-12

Application Information

Patent Timeline

09 Mar 2026

Application

12 Jun 2026

Publication

CN122197830A

IPC: G06F40/166; G06N5/022; G06Q10/10

AI Tagging

Application Domain

Natural language data processing Office automation

Explore More Agents

Novelty Search
Search existing technologies and assess novelty
↗
FTO
Analyze whether a product may infringe others' patents
↗
Design FTO
Check prior-design risk for exterior design
↗
Drafting
Draft patent application text based on a technical solution
↗
Find Solutions with TRIZ
Generate feasible solution to solve your technical challenge
↗

Similar Technology Patents

A value added tax rate tax item compliance checking method, system and program product
CN122199182ADatabase updating Finance
Source Identifying Forensics for Digital Media
US20260161748A1Natural language data processingProgram/content distribution protection
A multi-screen presentation page code overflow identification method, device, equipment and storage medium
CN122195370AAvoid error reportingThe recognition effect is accurateNatural language data processing Digital output to display device
Prompt word optimization method and device based on reasoning model, electronic equipment, storage medium and program product
CN122196112ADigital data information retrieval Natural language data processing
Document processing method and apparatus, electronic device, storage medium, and program product
CN122197828ANatural language data processing Input/output processes for data processing

Get free access to AI patent search and analysis

Check patentability, review prior art and ask IP Agent with full patent context.

AI Technical Summary

⚠Technical Problem

Existing technologies struggle to accurately extract relevant information from massive amounts of documents and generate structured reviews. Keyword matching methods also fail to capture semantic relationships, leading to one-sided reviews that omit key information.

⚗Method used

By responding to the document review generation request, the first prompt information is determined. Using the review generation model and a pre-built document knowledge base, the target document review is automatically generated, including the parts of multiple documents that correspond to the prompt information. Dual structured parsing and post-processing are performed to improve the accuracy and relevance of the review.

🎯Benefits of technology

It enables precise extraction and integration of information, improves the efficiency and relevance of review generation, ensures accurate and comprehensive content, and saves on manual processing costs.

✦ Generated by Eureka AI based on patent content.

Smart Images

Figure CN122197830A_ABST

Patent Text Reader

Abstract

The application discloses a document review generation method and device and electronic equipment. The method comprises the following steps: in response to a document review generation request, determining first prompt information corresponding to the generation request, wherein the first prompt information is used to describe the content related to the review to be generated; determining a target document review according to a review generation model, the first prompt information and a pre-constructed document knowledge base, wherein the document knowledge base comprises multiple documents, and the target document review at least comprises part of the document content corresponding to the first prompt information in the multiple documents, solving the problem that it is difficult to accurately extract relevant information from massive documents and generate a structured review in the related art, improving the review generation efficiency and relevance, ensuring the accuracy and comprehensiveness of the content and saving the manual sorting cost.

Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of artificial intelligence technology, and in particular to a method, apparatus, and electronic device for generating document summaries. Background Technology

[0002] With the rapid development of information technology, the number of documents in various fields has surged. How to efficiently extract key information from massive amounts of documents and generate structured reviews has become an urgent problem to be solved.

[0003] In related technologies, most retrieval-based summary generation methods are based on keyword matching. This involves filtering relevant documents from a document library using keywords input by the user, and then extracting high-frequency sentences to splice them into a summary. This method relies on precise keyword matching, makes it difficult to capture semantic connections, and can easily lead to a one-sided summary that omits key information. Summary of the Invention

[0004] This invention provides a method, apparatus, and electronic device for generating document reviews, in order to solve the problem in related technologies that it is difficult to accurately extract relevant information from massive documents and generate structured reviews.

[0005] According to one aspect of the present invention, a method for generating a document overview is provided, comprising: In response to a request to generate a document review, a first prompt message corresponding to the generation request is determined, wherein the first prompt message is used to describe content related to the review to be generated; The target document review is determined based on the review generation model, the first prompt information, and the pre-built document knowledge base, wherein the document knowledge base includes multiple documents, and the target document review includes at least a portion of the document content corresponding to the first prompt information from the multiple documents.

[0006] According to another aspect of the present invention, an apparatus for generating a document summary is provided, comprising: The prompt information generation module is used to respond to the document review generation request and determine a first prompt information corresponding to the generation request, wherein the first prompt information is used to describe the content related to the review to be generated; The document summary generation module is used to determine the target document summary based on the summary generation model, the first prompt information, and the pre-built document knowledge base, wherein the document knowledge base includes multiple documents, and the target document summary includes at least a portion of the document content corresponding to the first prompt information from the multiple documents.

[0007] According to another aspect of the present invention, an electronic device is provided, the electronic device comprising: At least one processor; and A memory communicatively connected to the at least one processor; wherein, The memory stores a computer program that can be executed by the at least one processor, the computer program being executed by the at least one processor to enable the at least one processor to perform the document summary generation method according to any embodiment of the present invention.

[0008] According to another aspect of the present invention, a computer-readable storage medium is provided, the computer-readable storage medium storing computer instructions for causing a processor to execute and implement the document summary generation method according to any embodiment of the present invention.

[0009] According to another aspect of the present disclosure, a computer program product is also provided, including a computer program that, when executed by a processor, implements the document summary generation method as described in any of the embodiments of the present disclosure.

[0010] The technical solution of this invention, in response to a document review generation request, determines a first prompt message corresponding to the generation request. Since the first prompt message describes content related to the review to be generated, it supports flexible document queries, adapts to differentiated document generation requests, and achieves accurate information extraction and integration. A target document review is determined based on the review generation model, the first prompt message, and a pre-built document knowledge base. The document knowledge base includes multiple documents, and the target document review includes at least a portion of the document content corresponding to the first prompt message from these multiple documents. This allows for the automatic generation of review documents through the review generation model, solving the problem in related technologies of accurately extracting relevant information from massive amounts of documents and generating structured reviews. This improves review generation efficiency and relevance, ensures accurate and comprehensive content, and saves on manual processing costs.

[0011] It should be understood that the description in this section is not intended to identify key or essential features of the embodiments of the present invention, nor is it intended to limit the scope of the invention. Other features of the invention will become readily apparent from the following description. Attached Figure Description

[0012] To more clearly illustrate the technical solutions in the embodiments of the present invention, the accompanying drawings used in the description of the embodiments will be briefly introduced below. Obviously, the accompanying drawings described below are only some embodiments of the present invention. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0013] Figure 1 This is a flowchart of a document overview generation method provided in Embodiment 1 of the present invention; Figure 2This is a flowchart of a document overview generation method provided in Embodiment 2 of the present invention; Figure 3 This is a flowchart of a document overview generation method provided in Embodiment 3 of the present invention; Figure 4 This is a schematic diagram of a document summary generation device according to Embodiment 4 of the present invention; Figure 5 This is a schematic diagram of the structure of an electronic device that implements the document summary generation method of the embodiments of the present invention. Detailed Implementation

[0014] To enable those skilled in the art to better understand the present invention, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. Obviously, the described embodiments are only some embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without creative effort should fall within the scope of protection of the present invention.

[0015] It should be noted that the terms "first," "second," etc., in the specification, claims, and accompanying drawings of this invention are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It should be understood that such data can be interchanged where appropriate so that the embodiments of the invention described herein can be implemented in orders other than those illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, product, or apparatus that comprises a series of steps or units is not necessarily limited to those steps or units explicitly listed, but may include other steps or units not explicitly listed or inherent to such processes, methods, products, or apparatus.

[0016] It should be noted that the terms "a" and "a plurality of" used in this disclosure are illustrative rather than restrictive, and those skilled in the art should understand that, unless otherwise expressly indicated in the context, they should be understood as "one or more".

[0017] The names of messages or information exchanged between multiple devices in the embodiments of this disclosure are for illustrative purposes only and are not intended to limit the scope of such messages or information.

[0018] It is understood that before using the technical solutions disclosed in the various embodiments of this disclosure, users should be informed of the types, scope of use, and usage scenarios of the personal information involved in this disclosure in an appropriate manner in accordance with relevant laws and regulations, and user authorization should be obtained.

[0019] For example, upon receiving a user's active request, a prompt message is sent to the user to explicitly inform them that the requested operation will require the acquisition and use of the user's personal information. This allows the user to independently choose whether to provide personal information to the software or hardware, such as the electronic device, application, server, or storage medium performing the operations of this disclosed technical solution, based on the prompt message.

[0020] As an optional but non-limiting implementation, in response to a user's active request, sending a prompt message to the user can be done via a pop-up window, where the prompt message can be presented in text format. Furthermore, the pop-up window can also include a selection control allowing the user to choose "agree" or "disagree" to provide personal information to the electronic device.

[0021] It is understood that the above notification and user authorization process are merely illustrative and do not constitute a limitation on the implementation of this disclosure. Other methods that comply with relevant laws and regulations may also be applied to the implementation of this disclosure.

[0022] It is understood that the data involved in this technical solution (including but not limited to the data itself, the acquisition or use of the data) shall comply with the requirements of relevant laws, regulations and related provisions.

[0023] Example 1 Figure 1 The flowchart of a document summary generation method provided in Embodiment 1 of the present invention is applicable to intelligent knowledge management and retrieval situations that rapidly generate topic-related document summaries. The method can be executed by a document summary generation device, which can be implemented in hardware and / or software, or optionally through an electronic device, such as a mobile terminal, a PC, or a server.

[0024] like Figure 1 As shown, the method may specifically include: S110. In response to the request to generate a document review, determine a first prompt message corresponding to the generation request, wherein the first prompt message is used to describe the content related to the review to be generated.

[0025] The document overview can be understood as a summary text formed by synthesizing, summarizing, and organizing the core content, viewpoints, data, or conclusions of multiple documents. It assists users in quickly grasping the core information of a large number of documents, reducing the time cost of reading the original documents. The generation request can be understood as a user-initiated instruction or signal to execute the document overview generation operation. It is the entry signal for the entire process, determining which task to execute and carrying necessary parameters (such as topic, necessary requirements, etc.). The first prompt information can be understood as prompts or instruction data formed at the beginning of the generation process, describing the relevant content of the target overview. It guides the model to understand the task requirements, essentially providing the model with an instruction manual, ensuring that the output matches the user's intent. The overview to be generated can be understood as the document overview to be produced by the current task. It is the target product, clarifying the final result of the task, allowing subsequent query, processing, and generation steps to revolve around it.

[0026] S120. Determine the target document review based on the review generation model, the first prompt information, and the pre-built document knowledge base, wherein the document knowledge base includes multiple documents, and the target document review includes at least a portion of the document content corresponding to the first prompt information from the multiple documents.

[0027] The review generation model can be understood as an artificial intelligence / algorithm model specifically trained or designed for automatically writing document reviews. It automatically organizes language from input (prompt information + knowledge base content) to generate a well-structured and complete review text. The document knowledge base can be understood as a pre-built database or index set containing multiple documents and their structured information, organized before the model runs. This provides the generation model with factual and material sources, ensuring the review content is based on evidence and allowing for rapid retrieval of relevant fragments. The target document review can be understood as the specific document review result finally output to the user after model processing, including a comprehensive report corresponding to the first prompt information. The multiple documents can be understood as a collection of independent files (such as PDFs, Word documents, web articles, etc.) forming the basis of the "document knowledge base," serving as basic data units and providing scattered knowledge points or information fragments for the model to filter and integrate. The partial document content can be understood as fragments, paragraphs, or data selected from the "multiple documents" that match the "first prompt information," rather than the full text of the documents. This is used to filter relevant information, ensuring the accuracy and relevance of the review.

[0028] Based on the above scheme, optionally, the document knowledge base is constructed in the following manner: acquiring multiple documents, performing structured parsing on the content of the multiple documents to obtain first structured document data of the multiple documents; extracting second structured document data of the multiple documents based on the multiple first structured document data and second prompt information through a target document parsing model, and storing the documents and their corresponding second structured document data to obtain the document knowledge base.

[0029] The structured parsing can be understood as converting unstructured document content (such as plain text streams) into data forms with clear hierarchy, tags, or formats (such as JSON, XML, tree structures). This serves as a preliminary data processing method, standardizing messy document content to enhance machine readability and laying the foundation for subsequent models to extract information from specific dimensions. The first structured document data can be understood as document data organized according to preset rules after the first structured parsing of the original document. It preserves the document's basic structural information (such as titles, paragraphs, lists, keywords, etc.) and serves as input for further in-depth processing. The target document parsing model can be understood as a specially designed or trained model used to extract finer-grained information from the first structured document data. Based on prompts, it can perform dynamic field replacements and, based on second prompts, accurately identify and extract information meeting specific dimensional requirements from the "first structured document data" to generate deeper-level second structured document data. The second prompts can be understood as a set of instructions controlling the operation of the target document parsing model, determining how the model parses and reconstructs document data, including at least multiple fixed descriptive fields and replaceable fields. The fixed description field can be understood as a predefined, unchangeable field in the prompt information, used to specify the structure or category of the document data, ensuring the structural consistency of the parsing results and facilitating subsequent retrieval and comparison. The replaceable field can be understood as a field defined in the prompt information that can be dynamically filled according to the actual document content, used to extract corresponding information that matches its semantics from the first structured document data and replace it. The second structured document data can be understood as a document data structure that integrates the values of fixed and replaceable fields after processing by the target document parsing model. It is a structured form stored in the knowledge base and can be directly used for retrieval, reasoning, and content generation.

[0030] One alternative implementation method is illustrated using a paper as an example. First, paper data is acquired as input. This data can originate from user-uploaded paper files or from external academic paper databases, and supports unified processing of common paper formats such as PDF and Word.

[0031] (1) Document parsing: After obtaining the paper, the content of the paper is parsed in a structured manner. The structured parsing includes automatically identifying and extracting information such as the title, abstract, chapter levels, main text paragraphs, tables and references of the paper to form the first structured document data. For PDF format papers, different content areas on the page are distinguished by combining layout analysis and text parsing technology to restore the original logical structure and reading order of the paper as much as possible.

[0032] (2) Understanding the content of the paper: In the paper analysis stage, not only is the paper structured at the format level, but also a large language model is introduced to perform in-depth semantic analysis of the paper content in order to achieve a refined understanding of the research content of the paper.

[0033] Specifically, after completing the document structure parsing, the paper's title, abstract, chapter titles, and main text are organized into structured input according to preset rules. A carefully designed second prompt then invokes a large language model to perform the paper parsing task. The objectives of this parsing task include, but are not limited to, identifying key information such as the paper's research background, research questions, data sources, methodology, experimental design, and main conclusions.

[0034] For example, the second prompt information can be used to parse the paper in the following form: 【 Prompt_1: You are an excellent researcher who regularly reads a large number of research papers, project reports, patents, and other documents. Now you need to organize these documents and extract the following content: 1. Title; 2. Abstract; 3. Authors; 4. Publication date [pubdate]: specified to the year; 5. Summary [review]: 1) Research fields, 2) Research content details, 3) Research method details, (4) Research achievements details; Note that the research content details should be no less than 800 words; 6. Reference: To facilitate citing this document, please generate a reference identifier for this document, in the form of: Author, Title, Journal Name, Date, Issue Number, Page Number.

[0035] Please return the results in JSON format: {"title": "","abstract": "","authors": "","pubdate": "","review": {"fields": "", "content": "","method": "","achievements": ""},"reference": ""} Document content: {paper_content} 】 (3) Paper entry into the database: Through the above methods, the large language model can perform semantic understanding of the paper content, extract key information such as the research topic, research direction, technical methods and main conclusions of the paper, and store the above parsing results together with the original paper content into the paper database to obtain the document knowledge base, which provides a reliable data foundation for subsequent retrieval, screening and review generation.

[0036] This technical solution employs dual structured parsing and utilizes second-level prompts to accurately extract multi-dimensional key information, constructing a knowledge base rich in deep semantics. This not only improves the accuracy and recall of document retrieval but also provides high-quality data support for review generation. It effectively solves the problems of coarse information granularity and weak semantic connections in traditional knowledge bases, significantly enhancing the professionalism and reliability of intelligent generation.

[0037] Based on the above scheme, optionally, after determining the target document review according to the review generation model, the first prompt information and the pre-built document knowledge base, the following steps are included: post-processing the target document review.

[0038] The post-processing can be understood as a series of improvements and optimizations performed on the target document review after its generation, including but not limited to at least one of the following: paragraph structure optimization, format standardization, language style adjustment, and citation verification. Paragraph structure optimization can be understood as adjusting the division, order, and hierarchy of paragraphs in the target document review to make the logic clearer and the transitions smoother. Language style adjustment can be understood as modifying the wording, tone, and sentence structure of the review according to the target audience or application scenario to better suit a specific style (such as academic rigor, popular science, or business conciseness). Citation verification can be understood as checking whether the sources of external materials, data, or viewpoints cited in the review are authentic, whether the citations are accurate, and whether the format conforms to standards.

[0039] One optional implementation involves post-processing the generated content after generating the target document review, including optimizing paragraph structure, standardizing formatting, and adjusting language style. Simultaneously, the literature citations in the generated content are validated to ensure that the research conclusions cited in the review all originate from the selected collection of papers.

[0040] In the review article citation verification step, the system first identifies the locations marked as citations in the review and extracts the corresponding text segments. Then, it calculates the similarity between this segment and the abstract or main text of the cited paper to determine whether the citation is appropriate.

[0041] For example, a text similarity-based verification method can be used: .

[0042] By adopting this technical solution and using a post-processing mechanism, the paragraph structure of the review is optimized, the format is standardized, the style is adjusted, and the citations are verified. This significantly improves the readability, professionalism, and authenticity of the document, ensuring that the output directly meets the high-quality delivery standards.

[0043] The technical solution of this invention, in response to a document review generation request, determines a first prompt message corresponding to the generation request. Since the first prompt message describes content related to the review to be generated, it supports flexible document queries, adapts to differentiated document generation requests, and achieves accurate information extraction and integration. A target document review is determined based on the review generation model, the first prompt message, and a pre-built document knowledge base. The document knowledge base includes multiple documents, and the target document review includes at least a portion of the document content corresponding to the first prompt message from these multiple documents. This allows for the automatic generation of review documents through the review generation model, solving the problem in related technologies of accurately extracting relevant information from massive amounts of documents and generating structured reviews. This improves review generation efficiency and relevance, ensures accurate and comprehensive content, and saves on manual processing costs.

[0044] Example 2 Figure 2This is a flowchart of a document summary generation method provided in Embodiment 2 of the present invention. This embodiment is a further refinement of the method for determining the target document summary based on the summary generation model, the first prompt information, and a pre-built document knowledge base, based on the above embodiments. Optionally, determining the target document summary based on the summary generation model, the first prompt information, and the pre-built document knowledge base includes: determining target keywords and target intent information based on the first prompt information using the summary generation model; querying the document knowledge base based on the target keywords and the target intent information; and determining the target document summary based on the multiple documents retrieved. For detailed implementation, please refer to the description of this embodiment. Technical features that are the same as or similar to those in the foregoing embodiments will not be repeated here.

[0045] like Figure 2 As shown, the method may specifically include: S210. In response to the request to generate a document review, determine a first prompt message corresponding to the generation request, wherein the first prompt message is used to describe the content related to the review to be generated.

[0046] S220. The review generation model determines the target keywords and target intent information based on the first prompt information, queries the document knowledge base based on the target keywords and target intent information, and determines the target document review based on the multiple documents retrieved. The document knowledge base includes multiple documents, and the target document review includes at least a portion of the document content from the multiple documents that corresponds to the first prompt information.

[0047] The target keywords can be understood as words or phrases extracted from the first prompt information that represent the core theme or main discussion object of the review, used to locate relevant documents in the document knowledge base, narrow the search scope, and improve information matching efficiency and relevance.

[0048] The target intent information can be understood as semantic information about the desired purpose or expression of the summary, which is parsed from the first prompt information. It guides the model's strategy selection in content organization, argumentation angle, and information filtering, ensuring that the output meets the user's expectations.

[0049] Based on the above scheme, optionally, the step of querying the document knowledge base according to the target keywords and the target intent information includes: expanding multiple target keywords to obtain first expanded words; combining at least some of the first expanded words to obtain a first query combination; expanding the languages corresponding to the first expanded words to obtain second expanded words corresponding to multiple languages; determining a second query combination according to the first query combination and the multiple second expanded words; and querying the document knowledge base according to the first query combination, the second query combination, and the target intent information.

[0050] The expansion processing can be understood as extending existing keywords semantically or linguistically to increase the scope of the search, addressing the issue of missing relevant documents in single-keyword searches and improving recall. This expansion processing includes, but is not limited to, synonym expansion, terminology variation expansion, and hypernym / hypernym expansion. Synonym expansion involves adding words with the same or similar meanings to the target keyword to the search scope, capturing instances where different documents use different terms to describe the same concept, thus avoiding missed detections. Terminology variation expansion involves expanding the expression of the same concept in different fields or contexts (e.g., full name / abbreviation, professional abbreviation), accommodating terminological differences from different sources and broadening the matching coverage. Hypernym / hypernym expansion involves including the hypernym (broader category) and hyponym (more specific subclass) of the target keyword in the search scope, ensuring that both broader background documents and more specific topic documents are retrieved. The first query combination can be understood as a query condition for retrieving knowledge base by concatenating or logically associating multiple first extended terms according to certain rules. It is a preliminary multi-keyword retrieval expression aimed at covering different expressions of the core topic. The language can be understood as the language used in the text (e.g., Chinese, English, French, etc.), which determines the written form of the search terms and the matching rules. The second extended term can be understood as a set of corresponding expressions of the first extended term in other languages, expanding the language coverage of the retrieval and improving the recall rate in a multilingual environment. The second query combination can be understood as a semantically identical query condition formed by replacing the first extended term with the second extended term based on the meaning of the first query combination. It is a cross-language version of the first query combination, ensuring that documents in different languages can be retrieved.

[0051] In one optional implementation, the system receives a first prompt message entered by the user in the search box. Unlike traditional search systems, this invention argues that the first prompt message entered by the user is not only used to retrieve relevant papers, but also implicitly suggests requirements for the writing style and content structure of the review.

[0052] First, the system identifies the search topic from the initial prompt input by the user, recognizing the relevant paper search topics. Then, it extracts several search keywords from the user's input. Considering that most papers contain English abstracts or keywords, the system simultaneously generates Chinese keywords and their corresponding English keywords to improve cross-language search coverage. An example prompt is shown below: 【 Prompt_2: This is an intelligent analysis task for retrieving academic papers. We need to analyze the search query entered by the user. The user may enter a research field, research content, organization name, or scholar's name. Please analyze the user input: 1. If the user input is a piece of natural language, please analyze its target and obtain 1 to 3 possible keywords for paper retrieval. For example, "help me find papers related to image recognition models", the keyword can be identified as: "[image recognition]". The number of keywords is related to the length of the user input. If the user input itself is a keyword, there is no need to split it into multiple keywords. 2. If the user enters an organization name or scholar's name, please use it directly as the search keyword.

[0053] 3. If the user inputs an abbreviation, it should also be understandable, for example: LLM stands for Large Language Model.

[0054] 4. Organize the compiled keywords into both Chinese and English formats.

[0055] 5. Avoid outputting generic terms such as "technology", "paper", and "key technology".

[0056] 6. Output in JSON format: {"keywords_zh":[],"keywords_en":[]} User input: {question} 】 Meanwhile, the system further analyzes the user input using a large language model to determine whether it contains explicit or implicit requirements regarding the structure of the review article, such as "describe the research results according to categories such as data construction, algorithm model, and monitoring and early warning." An example prompt is shown below: 【 Prompt_3: You are an academic review writing assistant.

[0057] Please complete the following analysis based on the user's input: 1) Identify the topic of the paper search; 2) Determine if the user has specified any requirements for the review structure; 3) If there are structural requirements, please provide them in list form. [User Input]: Write a review on tobacco pests and diseases, elaborating on relevant research findings according to categories such as data construction, algorithm models, and monitoring and early warning. 】 Based on the above analysis, user input is broken down into two parts: first, keyword information for paper retrieval; and second, writing intent information to guide the generation of the review article. The writing intent information is stored in a structured format and serves as an important constraint in the subsequent review article generation process, ensuring that the generated content meets the user's actual needs.

[0059] Furthermore, in practical applications such as writing academic review papers, it is necessary not only to ensure the relevance of search results but also to meet the engineering feasibility, stability, and controllability of large-scale literature retrieval. Addressing the shortcomings of existing vector retrieval schemes in engineering practice, this invention does not employ a semantic retrieval method based on vector similarity. Instead, it proposes a literature retrieval scheme that combines keyword retrieval with query expansion.

[0060] In existing technologies, vector retrieval typically maps documents and queries to a high-dimensional semantic vector space and matches them based on similarity. However, in the scenario of generating academic reviews, vector retrieval has the following significant limitations: Vector retrieval requires vectorizing all submitted papers, and real-time vector encoding of the query statement is required for each user search. This process is computationally expensive, relies on model inference services, and has a slow overall response time, making it unsuitable for supporting high-concurrency or interactive retrieval scenarios. In vector retrieval, search results are usually sorted based on the similarity of the entire vector dataset, making it difficult to integrate naturally with traditional database pagination mechanisms (such as offset + limit). When stable pagination of a large number of search results is required, additional caching or repeated similarity calculations are often necessary, increasing system complexity and making it difficult to guarantee result consistency. In academic retrieval practice, a large number of paper searches still rely on subject terms, keywords, and their synonyms, especially since most papers include English abstracts and keyword fields. Keyword-based retrieval methods have a natural advantage in this scenario.

[0061] For the reasons mentioned above, this invention abandons the vector retrieval scheme and instead adopts a keyword retrieval + query expansion approach, which improves the stability, controllability and engineering feasibility of the system while ensuring retrieval coverage.

[0062] After generating search keywords based on user input, the system introduces query expansion technology to broaden the target keywords. Expansion methods include, but are not limited to, synonym expansion, related concept expansion, and hypernym / hypernym expansion, resulting in a richer set of search keywords. Through query expansion, the system effectively mitigates the problem of retrieval omissions caused by incomplete or inconsistent user-input keywords, thereby improving the recall rate of paper retrieval.

[0063] The core idea of query expansion is: Without altering the user's original search intent, richer search queries can be constructed by introducing semantically related, similarly expressed, or hierarchical terms, thereby improving document recall.

[0064] In this solution, query expansion mainly includes the following categories: Synonym expansion: For example, "pests and diseases" can be expanded to "diseases", "insects", and "plant diseases"; Term variant expansion: such as "monitoring" expanding to "detection", "identification", and "perception"; Expansion of hierarchical concepts: For example, "early warning" can be expanded to "risk early warning," "disaster early warning," and "intelligent early warning." Chinese-English alignment extension: Ensures semantic consistency between Chinese and English terms; The expanded keywords will participate in the retrieval in the form of Boolean logic combinations, such as combining multiple expanded words through OR relations, thereby significantly improving the recall range without sacrificing accuracy.

[0065] By adopting this technical solution, through multi-dimensional expansion of synonyms, term variants, and hypernyms and hyponyms, and combining cross-language semantic mapping to construct a dual query combination, it effectively breaks through the limitations of user vocabulary and language barriers. This not only significantly improves the recall rate and coverage of the retrieval, ensuring that relevant documents in multiple languages around the world are accurately located, but also provides solid data support for generating comprehensive and thorough reviews.

[0066] Optionally, based on the above scheme, determining the target document summary based on the multiple documents retrieved includes: constructing a candidate document set based on the multiple documents retrieved, determining a target document set based on the candidate document set, and determining the target document summary based on the target document set.

[0067] The candidate document set can be understood as a collection formed after preliminary sorting, deduplication, and tagging of multiple retrieved documents. This collection serves as input for subsequent filtering, standardizing and structuring the original search results for easier unified management and evaluation, thus providing a foundation for determining the target document set. The target document set can be understood as a subset of documents selected from the candidate document set based on criteria such as relevance, importance, timeliness, and authority. Documents with low relevance or insufficient quality are removed, improving the accuracy and information density of the review and ensuring that the final content closely adheres to the theme.

[0068] One alternative implementation involves performing a search on a paper database based on an expanded keyword set to obtain a set of candidate papers related to the search topic. During the search process, factors such as keyword matching degree and the relevance of the paper abstracts to the main text are comprehensively considered to rank the search results.

[0069] After the initial search is completed, the search results are displayed to the user. The user can then filter, confirm, or upload other relevant papers as needed. The papers finally confirmed by the user are used as input data for the subsequent review articles, ensuring that the generated content has clear data sources and references.

[0070] By adopting this technical solution, a filtering mechanism that constructs a candidate set through initial screening and determines the target set through fine screening effectively eliminates retrieval noise and low-relevance documents. This not only ensures the high quality and high relevance of the materials generated for the review, but also significantly improves the accuracy, credibility and information density of the final review.

[0071] The technical solution of this invention automatically analyzes user prompts through a model, accurately extracts target keywords and deep intents, and drives the knowledge base to perform targeted retrieval. This effectively solves the problem of low matching degree in traditional search, ensures that the review is generated based on highly relevant documents, and significantly improves the relevance, logical coherence, and responsiveness to user needs of the content.

[0072] Example 3 Figure 3This is a flowchart of a document summary generation method provided in Embodiment 2 of the present invention. This embodiment is a further supplement to the above embodiment in determining the target document summary based on the summary generation model, the first prompt information, and a pre-built document knowledge base. Optionally, after determining the target document summary based on the summary generation model, the first prompt information, and the pre-built document knowledge base, the method further includes: determining the context length of multiple documents in the document knowledge base used in the target document summary, wherein the context length is related to the number of pages of the document; and determining the target document summary based on the context length and a preset length threshold. For detailed implementation, please refer to the description of this embodiment. Technical features that are the same as or similar to those in the foregoing embodiments will not be repeated here.

[0073] like Figure 3 As shown, the method may specifically include: S310. In response to the request to generate a document review, determine a first prompt message corresponding to the generation request, wherein the first prompt message is used to describe the content related to the review to be generated.

[0074] S320. Determine the target document review based on the review generation model, the first prompt information, and the pre-built document knowledge base, wherein the document knowledge base includes multiple documents, and the target document review includes at least a portion of the document content corresponding to the first prompt information from the multiple documents.

[0075] S330. Determine the context length of multiple documents in the document knowledge base used in the target document overview.

[0076] S340. Determine the target document summary based on the context length and the preset length threshold.

[0077] The context length can be understood as the amount of information allowed or actually used for a specific document when generating a review. The context length is related to the number of pages in the document and can be measured by the number of pages to quantify the participation of each document in the review, helping to determine whether sufficient core content is covered or whether there is excessive citation. The preset length threshold can be understood as a pre-set critical value used to determine whether the context length is compliant or to decide the final output strategy. It is related to the total number of pages of the documents supported by the model as input, playing a quality control and constraint role, and preventing reviews that are too long leading to information dilution or too short leading to information loss.

[0078] Based on the above scheme, optionally, the preset length threshold includes a first length threshold and a second length threshold; determining the target document overview based on the context length and the preset length threshold includes at least one of the following: when the context length is less than the first length threshold, determining the target document overview based on multiple documents and the first prompt information using an overview generation model; when the context length is greater than the first length threshold and less than the second length threshold, generating multiple document sets based on multiple documents using an overview generation model, determining partial overview content based on each document set and the first prompt information, and integrating the overview content corresponding to the multiple document sets into a target document overview; when the context length is greater than the second length threshold, determining a first document and a second document based on multiple documents using an overview generation model, determining an initial document overview based on the first document and the first prompt information, supplementing and / or revising the initial document overview based on the second document and the first prompt information to obtain the target document overview, wherein at least some content in the second document is associated with at least some content in the first document in the document knowledge base.

[0079] The first length threshold can be understood as the smaller value among preset length thresholds, used to define the lower limit of the context length. When the context length is lower than this value, it is considered that the number of pages in the input model is insufficient, and more documents can be input to expand the information sources and generate a complete review. The second length threshold can be understood as the larger value among preset length thresholds, used to define the upper limit of the context length. When the context length is higher than this value, it is considered that the total number of pages in the documents in the input model is too large, and it is necessary to simplify or focus on some document information sources to avoid redundancy. The document set can be understood as a subset formed by grouping multiple documents according to certain rules (such as topic, relevance, length, and the total number of pages of the multiple documents is between the two thresholds), used to generate review content in stages, which facilitates control over the length and information granularity. The partial review content can be understood as a review fragment generated by the review generation model in combination with the first prompt information for a certain document set, which only covers the content of that set. It is an intermediate product of multi-segment generation and will eventually be integrated into a complete review. The first document can be understood as the document selected from multiple documents as the core basis for generating the initial review, provided when the context length exceeds the second length threshold, offering the main content and framework of the review. The second document can be understood as the document selected from multiple documents to supplement or revise the initial review, providing additional details and improving the depth and breadth of the review, when the context length exceeds the second length threshold. The initial document review can be understood as a short draft review generated based on the first document and the first prompt information, serving as the basis for subsequent supplements / revisions to ensure the stability of core information. Supplementation can be understood as adding new information, viewpoints, or data to the initial document review to enrich the content. Revision can be understood as modifying, deleting, or optimizing inaccurate, incomplete, or redundant parts of the initial document review. Relevance can be understood as a logical connection between the two documents in terms of content or attributes, including but not limited to thematic association, keyword semantic association, research method association, research content association, and research field association. Keyword semantic association can be understood as the keywords used in the two documents being similar in meaning or belonging to a synonym / near-synonymous relationship.

[0080] An alternative implementation involves further validation and optimization of the generated target review document. After obtaining the collection of papers and the user's writing intent, the review structure is planned according to the identified intent, for example, dividing the review chapters according to different dimensions such as "data construction," "algorithm model," and "monitoring and early warning." To improve the generation quality, sample texts can be introduced as reference templates to guide the large language model in generating review content with a consistent style and clear structure.

[0081] Since large language models have context length limitations in a single call, a phased and iterative generation strategy is introduced in the summary generation stage to dynamically select the appropriate generation method based on the length of the current input content.

[0082] (1) Stuff method (suitable for small collections of papers) When the number of papers to be processed is small and the context length does not exceed the model limit, the Stuff method is adopted, which inputs the parsing results of all papers into the large language model at once to directly generate complete review content.

[0083] (2) Map-Reduce method (suitable for medium-sized collections of papers) When there are a large number of papers, the MapReduce method is used.

[0084] In the Map phase, the system calls the large language model to generate local review fragments for each paper or group of papers; in the Reduce phase, the multiple local results are then aggregated to generate the overall review.

[0085] (3) Refine method (suitable for large-scale collections of papers) When the number of papers is large and the review needs to be gradually improved, the system uses the Refine method. The system first generates an initial review version based on a small number of core papers, and then gradually introduces new papers to supplement and revise the original review, rather than rewriting it completely.

[0086] This technical solution dynamically adjusts the summary generation by using a hierarchical length threshold, ensuring that the content is both sufficient and concise. Short content is generated in its entirety, medium content is segmented and integrated, and excessively long content focuses on the core and supplements related information, effectively avoiding information loss or redundancy. It can effectively address the problem of contextual limitations in large language models while ensuring the quality of generation.

[0087] The technical solution of this invention accurately assesses the length of the review after it is generated, achieves quantitative control by associating the context length with the number of pages, and dynamically adjusts it by combining preset thresholds. This avoids the content being too short and missing key information or too long and causing redundancy, ensuring that the review is compact and complete, improving reading efficiency and generation quality, while also enhancing the system's adaptability and intelligence to different length requirements.

[0088] Example 4 Figure 4 This is a schematic diagram of a document summary generation device provided in Embodiment 4 of the present invention. Figure 4 As shown, the device includes: a prompt information generation module 410 and a document summary generation module 420.

[0089] The prompt information generation module 410 is used to respond to the document review generation request and determine a first prompt information corresponding to the generation request, wherein the first prompt information is used to describe the content related to the review to be generated; the document review generation module 420 is used to determine a target document review based on the review generation model, the first prompt information and a pre-built document knowledge base, wherein the document knowledge base includes multiple documents and the target document review includes at least a portion of the document content corresponding to the first prompt information from the multiple documents.

[0090] The technical solution of this invention, in response to a document review generation request, determines a first prompt message corresponding to the generation request. Since the first prompt message describes content related to the review to be generated, it supports flexible document queries, adapts to differentiated document generation requests, and achieves accurate information extraction and integration. The document review generation module determines a target document review based on a review generation model, the first prompt message, and a pre-built document knowledge base. The document knowledge base includes multiple documents, and the target document review includes at least a portion of the document content corresponding to the first prompt message from these documents. This allows for automatic generation of review documents through the review generation model, solving the problem in related technologies of accurately extracting relevant information from massive amounts of documents and generating structured reviews. This improves review generation efficiency and relevance, ensures accurate and comprehensive content, and saves on manual processing costs.

[0091] Optionally, the document summary generation module includes a document summary generation submodule. This submodule is used to determine target keywords and target intent information based on the first prompt information using the summary generation model, query the document knowledge base based on the target keywords and target intent information, and determine a target document summary based on the retrieved documents.

[0092] Optionally, the document summary generation submodule includes a query unit. Specifically, the query unit is used to expand multiple target keywords to obtain first expanded words; combine at least a portion of the first expanded words to obtain a first query combination; wherein the expansion processing includes at least one of synonym expansion, term variant expansion, and hypernym / hypernym expansion; expand the languages corresponding to the first expanded words to obtain second expanded words corresponding to multiple languages; determine a second query combination based on the first query combination and the multiple second expanded words; wherein the first query combination and the second query combination have the same semantics; and query the document knowledge base based on the first query combination, the second query combination, and the target intent information.

[0093] Optionally, the document summary generation submodule includes a document summary determination unit. The document summary determination unit is configured to construct a candidate document set based on the retrieved multiple documents, determine a target document set based on the candidate document set, and determine a target document summary based on the target document set.

[0094] Optionally, the document overview generation device further includes a knowledge base construction module. The knowledge base construction module is used to acquire multiple documents, perform structured parsing on the content of the multiple documents to obtain first structured document data of the multiple documents; extract second structured document data of the multiple documents based on the first structured document data and second prompt information using a target document parsing model; and store the documents and their corresponding second structured document data to obtain a document knowledge base. The second prompt information includes fixed description fields and replaceable fields of multiple dimensions. The replaceable fields are used to extract corresponding information that matches the semantics of the first structured document data and replace it.

[0095] Optionally, the document overview generation apparatus further includes: a context length determination module and a second document overview determination module. The context length determination module is used to determine the context length of multiple documents in the document knowledge base used in the target document overview after determining the target document overview based on the overview generation model, the first prompt information, and the pre-built document knowledge base, wherein the context length is associated with the number of pages in the document; the second document overview determination module determines the target document overview based on the context length and a preset length threshold.

[0096] Optionally, the preset length threshold includes a first length threshold and a second length threshold; the second document review determination module is used for at least one of the following: when the context length is less than the first length threshold, determining a target document review based on multiple documents and the first prompt information using a review generation model; when the context length is greater than the first length threshold and less than the second length threshold, generating multiple document sets based on multiple documents using a review generation model, determining partial review content based on each document set and the first prompt information, and integrating the review content corresponding to the multiple document sets into a target document review; when the context length is greater than the second length threshold, determining a first document and a second document based on multiple documents using a review generation model, determining an initial document review based on the first document and the first prompt information, and supplementing and / or revising the initial document review based on the second document and the first prompt information to obtain the target document review, wherein at least some content in the second document is associated with at least some content in the first document in the document knowledge base, and the association includes at least one of topic association, keyword semantic association, research method association, research content association, and research field association.

[0097] Optionally, the document review generation apparatus further includes a post-processing module. The post-processing module is used to perform post-processing on the target document review after the target document review is determined based on the review generation model, the first prompt information, and the pre-built document knowledge base. The post-processing includes at least one of paragraph structure optimization, format standardization, language style adjustment, and citation verification.

[0098] The document summary generation apparatus provided in this embodiment of the invention can execute the document summary generation method provided in any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

[0099] Example 5 Figure 5 A schematic diagram of an electronic device 10, which can be used to implement embodiments of the present invention, is shown. The electronic device is intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device can also represent various forms of mobile devices, such as personal digital processors, cellular phones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions are merely illustrative and are not intended to limit the implementation of the invention described and / or claimed herein.

[0100] like Figure 5 As shown, the electronic device 10 includes at least one processor 11 and a memory, such as a read-only memory (ROM) 12 or a random access memory (RAM) 13, communicatively connected to the at least one processor 11. The memory stores computer programs executable by the at least one processor. The processor 11 can perform various appropriate actions and processes based on the computer program stored in the ROM 12 or loaded from storage unit 18 into the RAM 13. The RAM 13 can also store various programs and data required for the operation of the electronic device 10. The processor 11, ROM 12, and RAM 13 are interconnected via a bus 14. An input / output (I / O) interface 15 is also connected to the bus 14.

[0101] Multiple components in electronic device 10 are connected to I / O interface 15, including: input unit 16, such as keyboard, mouse, etc.; output unit 17, such as various types of displays, speakers, etc.; storage unit 18, such as disk, optical disk, etc.; and communication unit 19, such as network card, modem, wireless transceiver, etc. Communication unit 19 allows electronic device 10 to exchange information / data with other devices through computer networks such as the Internet and / or various telecommunications networks.

[0102] Processor 11 can be a variety of general-purpose and / or special-purpose processing components with processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various special-purpose artificial intelligence (AI) computing chips, various processors running machine learning model algorithms, a digital signal processor (DSP), and any suitable processor, controller, microcontroller, etc. Processor 11 performs the various methods and processes described above, such as a method for generating a document review.

[0103] In particular, according to embodiments of the present invention, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present invention include a computer program product comprising a computer program carried on a non-transitory computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication unit 19, or installed from storage unit 18, or installed from ROM 12. When the computer program is executed by processor 11, it performs the functions defined in the methods of the embodiments of the present invention.

[0104] In some embodiments, a document review generation method may be implemented as a computer program tangibly contained in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and / or mounted on electronic device 10 via ROM 12 and / or communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the document review generation method described above may be performed. Alternatively, in other embodiments, processor 11 may be configured to perform a document review generation method by any other suitable means (e.g., by means of firmware).

[0105] Various embodiments of the systems and techniques described above herein can be implemented in digital electronic circuit systems, integrated circuit systems, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), application-specific standard products (ASSPs), systems-on-a-chip (SoCs), complex programmable logic devices (CPLDs), computer hardware, firmware, software, and / or combinations thereof. These various embodiments may include implementations in one or more computer programs that can be executed and / or interpreted on a programmable system including at least one programmable processor, which may be a dedicated or general-purpose programmable processor, capable of receiving data and instructions from a storage system, at least one input device, and at least one output device, and transmitting data and instructions to the storage system, the at least one input device, and the at least one output device.

[0106] Computer programs used to implement the methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, such that when executed by the processor, the computer programs cause the functions / operations specified in the flowcharts and / or block diagrams to be performed. The computer programs may be executed entirely on a machine, partially on a machine, or as a standalone software package, partially on a machine and partially on a remote machine, or entirely on a remote machine or server.

[0107] In the context of this invention, a computer-readable storage medium can be a tangible medium that may contain or store a computer program for use by or in conjunction with an instruction execution system, apparatus, or device. A computer-readable storage medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination thereof. Alternatively, a computer-readable storage medium may be a machine-readable signal medium. More specific examples of machine-readable storage media include electrical connections based on one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fibers, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination thereof.

[0108] To provide interaction with a user, the systems and techniques described herein can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user; and a keyboard and pointing device (e.g., a mouse or trackball) through which the user provides input to the electronic device. Other types of devices can also be used to provide interaction with the user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form (including sound input, voice input, or tactile input).

[0109] The systems and technologies described herein can be implemented in computing systems that include backend components (e.g., as data servers), or middleware components (e.g., application servers), or frontend components (e.g., user computers with graphical user interfaces or web browsers through which users can interact with implementations of the systems and technologies described herein), or any combination of such backend, middleware, or frontend components. The components of the system can be interconnected via digital data communication of any form or medium (e.g., communication networks). Examples of communication networks include local area networks (LANs), wide area networks (WANs), blockchain networks, and the Internet.

[0110] A computing system can include clients and servers. Clients and servers are generally located far apart and typically interact through communication networks. The client-server relationship is created by computer programs running on the respective computers and having a client-server relationship with each other. The server can be a cloud server, also known as a cloud computing server or cloud host, which is a hosting product within the cloud computing service system to address the shortcomings of traditional physical hosts and VPS services, such as high management difficulty and weak business scalability.

[0111] It should be understood that the various forms of processes shown above can be used, with steps reordered, added, or deleted. For example, the steps described in this invention can be executed in parallel, sequentially, or in different orders, as long as the desired result of the technical solution of this invention can be achieved, and this is not limited herein.

[0112] The specific embodiments described above do not constitute a limitation on the scope of protection of this invention. Those skilled in the art should understand that various modifications, combinations, sub-combinations, and substitutions can be made according to design requirements and other factors. Any modifications, equivalent substitutions, and improvements made within the spirit and principles of this invention should be included within the scope of protection of this invention.

Claims

1. A method for generating a document overview, characterized in that, include: In response to a request to generate a document review, a first prompt message corresponding to the generation request is determined, wherein the first prompt message is used to describe content related to the review to be generated; The target document review is determined based on the review generation model, the first prompt information, and the pre-built document knowledge base, wherein the document knowledge base includes multiple documents, and the target document review includes at least a portion of the document content corresponding to the first prompt information from the multiple documents.

2. The method according to claim 1, characterized in that, The step of determining the target document review based on the review generation model, the first prompt information, and the pre-built document knowledge base includes: The review generation model determines target keywords and target intent information based on the first prompt information, queries the document knowledge base based on the target keywords and target intent information, and determines the target document review based on the multiple documents retrieved.

3. The method according to claim 2, characterized in that, The step of querying the document knowledge base based on the target keywords and the target intent information includes: The target keywords are expanded to obtain first expanded words. At least some of the first expanded words are combined to obtain a first query combination. The expansion process includes at least one of synonym expansion, term variant expansion and hypernym expansion. The first extended word is extended to the language to obtain the second extended word corresponding to multiple languages. The second query combination is determined based on the first query combination and the multiple second extended words, wherein the first query combination and the second query combination have the same semantics. The document knowledge base is queried based on the first query combination, the second query combination, and the target intent information.

4. The method according to claim 2, characterized in that, The process of determining the target document overview based on multiple retrieved documents includes: A candidate document set is constructed based on the multiple documents retrieved, a target document set is determined based on the candidate document set, and a target document summary is determined based on the target document set.

5. The method according to claim 1, characterized in that, The document knowledge base is constructed based on the following method: Obtain multiple documents, perform structured parsing on the content of the multiple documents, and obtain the first structured document data of the multiple documents; The target document parsing model extracts second structured document data from multiple documents based on the first structured document data and the second prompt information, and stores the documents and their corresponding second structured document data to obtain a document knowledge base. The second prompt information includes fixed description fields and replaceable fields with multiple dimensions. The replaceable fields are used to extract corresponding information that matches the semantics of the first structured document data and replace it.

6. The method according to claim 1, characterized in that, After determining the target document review based on the review generation model, the first prompt information, and the pre-built document knowledge base, the method further includes: Determine the context length of multiple documents in the document knowledge base used in the target document overview, wherein the context length is associated with the number of pages of the document; The target document overview is determined based on the context length and the preset length threshold.

7. The method according to claim 6, characterized in that, The preset length threshold includes a first length threshold and a second length threshold; determining the target document summary based on the context length and the preset length threshold includes at least one of the following: If the context length is less than the first length threshold, the target document review is determined by the review generation model based on multiple documents and the first prompt information. When the context length is greater than a first length threshold and less than a second length threshold, multiple document sets are generated based on the multiple documents through the review generation model. Partial review content is determined based on each document set and the first prompt information. The review content corresponding to the multiple document sets is integrated into a target document review. When the context length is greater than the second length threshold, a first document and a second document are determined based on multiple documents using a review generation model. An initial document review is determined based on the first document and the first prompt information. The initial document review is supplemented and / or revised based on the second document and the first prompt information to obtain the target document review. At least some content in the second document is associated with at least some content in the first document in the document knowledge base. The association includes at least one of topic association, keyword semantic association, research method association, research content association, and research field association.

8. The method according to claim 1, characterized in that, After determining the target document review based on the review generation model, the first prompt information, and the pre-built document knowledge base, the process includes: The target document review is post-processed, wherein the post-processing includes at least one of paragraph structure optimization, format standardization, language style adjustment, and citation verification.

9. A document summary generation apparatus, characterized in that, include: The prompt information generation module is used to respond to the document review generation request and determine a first prompt information corresponding to the generation request, wherein the first prompt information is used to describe the content related to the review to be generated; The document summary generation module is used to determine the target document summary based on the summary generation model, the first prompt information, and the pre-built document knowledge base, wherein the document knowledge base includes multiple documents, and the target document summary includes at least a portion of the document content corresponding to the first prompt information from the multiple documents.

10. An electronic device, characterized in that, The electronic device includes: At least one processor; and A memory communicatively connected to the at least one processor; wherein, The memory stores a computer program executable by the at least one processor, which enables the at least one processor to perform the document summary generation method according to any one of claims 1-8.