Document analysis method, apparatus, device, storage medium, and computer program product
By searching for document analysis models that match the document analysis scenario in the model library and adjusting the matching relationship based on user feedback, the document analysis model adaptability problem was solved, improving the accuracy of document analysis and the user experience.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- BEIJING QIHOOD TECHNOLOGY CO LTD
- Filing Date
- 2024-12-17
- Publication Date
- 2026-06-19
AI Technical Summary
How to ensure the compatibility between the document analysis model and the document to be analyzed, especially when different models perform significantly differently in specific document scenarios, and how to improve analysis accuracy and user experience.
By searching for document analysis models that match the document analysis scenario in the model library, combining multiple document analysis models to form an analysis strategy, and adjusting the matching relationship based on user feedback, the system dynamically recommends the most suitable model for document analysis, thereby building and updating the model library.
It achieves a high degree of matching between the document analysis model and the document analysis scenario, meets the needs of multiple scenarios, and improves the accuracy of document analysis and user experience.
Smart Images

Figure CN122242467A_ABST
Abstract
Description
Technical Field
[0001] This application relates to the field of computer technology, and in particular to a document analysis method, apparatus, device, storage medium, and computer program product. Background Technology
[0002] Currently, with the rapid development of natural language processing technology, document analysis technology has been widely applied in text classification, information extraction, semantic understanding, machine translation, and other fields. At the same time, with the diversification of deep learning models, the performance differences between different models in specific document scenarios are becoming increasingly significant. For example, traditional rule-based models outperform some deep learning-based models in structured documents, while Transformer-based models have significant advantages in semantic understanding and long document processing. Therefore, ensuring the adaptability of document analysis models to the documents being analyzed is a crucial technical problem that needs to be solved. Summary of the Invention
[0003] The main objective of this application is to provide a document analysis method, apparatus, device, storage medium, and computer program product, which aims to solve the technical problem of how to ensure the compatibility between the document analysis model and the document to be analyzed.
[0004] To achieve the above objectives, this application provides a document analysis method, which includes:
[0005] In response to a document analysis request, determine the document analysis scenario based on the document to be analyzed;
[0006] Search the model library for a document analysis model that matches the document analysis scenario, wherein the model library includes matching relationships between various analysis scenarios and various document analysis models;
[0007] The document to be analyzed is analyzed using the document analysis model to obtain document analysis results.
[0008] Optionally, before determining the document analysis scenario based on the document to be analyzed in response to the document analysis request, the method further includes:
[0009] In response to a model comparison request, select multiple document analysis models from the model integration library;
[0010] Obtain the analysis performance parameters of various document analysis models in various document analysis scenarios;
[0011] The matching relationship between various document analysis scenarios and various document analysis models is determined based on the analysis performance parameters.
[0012] A model library is built based on the matching relationship between various document analysis scenarios and various document analysis models.
[0013] Optionally, obtaining the analysis performance parameters of various document analysis models under various document analysis scenarios includes:
[0014] Analyze various document analysis scenarios to obtain the scenario characteristics of each scenario;
[0015] Based on the scene characteristics, obtain and / or generate representative documents corresponding to various document analysis scenarios;
[0016] By analyzing the representative document using the various document analysis models, the analytical performance parameters of each model under various document analysis scenarios are obtained.
[0017] Optionally, before analyzing the representative document using the multiple document analysis models to obtain the analysis performance parameters of each document analysis model under various document analysis scenarios, the method further includes:
[0018] The representative document is preprocessed and parsed to obtain the processed document;
[0019] Accordingly, the step of analyzing the representative document using the various document analysis models to obtain the analysis performance parameters of each document analysis model under various document analysis scenarios includes:
[0020] The processed document is analyzed using the various document analysis models to obtain the analysis performance parameters of each model in various document analysis scenarios.
[0021] Optionally, the step of analyzing the representative document using the multiple document analysis models to obtain the analysis performance parameters of each document analysis model under various document analysis scenarios includes:
[0022] By analyzing the representative document using the various document analysis models, multiple analysis performance parameters of the various document analysis models under various document analysis scenarios are obtained.
[0023] The weight values of each analysis performance parameter are determined based on the document analysis scenario;
[0024] The analytical performance parameters of various document analysis models under various document analysis scenarios are calculated based on the multiple analytical performance parameters and the weight values of each analytical performance parameter.
[0025] Optionally, determining the matching relationship between various document analysis scenarios and various document analysis models based on the analysis performance parameters includes:
[0026] Based on the analysis performance parameters, a comparison chart of the analysis effects of various document analysis models is generated and displayed.
[0027] Receive matching operations from users based on the analysis effect comparison chart;
[0028] The matching operation determines the matching relationship between various document analysis scenarios and various document analysis models.
[0029] Optionally, the step of determining the document analysis scenario based on the document to be analyzed in response to the document analysis request includes:
[0030] In response to a document analysis request, the document analysis request is parsed to obtain an analysis task;
[0031] Obtain the file extension of the document to be analyzed, and identify the document format of the document to be analyzed based on the file extension;
[0032] Based on the document format and the analysis task, scene recognition is performed to obtain the document analysis scene.
[0033] Optionally, before performing scene recognition based on the document format and the analysis task to obtain the document analysis scene, the method further includes:
[0034] The document content of the document to be analyzed is identified by a large language model to obtain the content characteristics of the document to be analyzed;
[0035] Accordingly, the step of performing scene recognition based on the document format and the analysis task to obtain the document analysis scene includes:
[0036] Based on the document format, content characteristics, and analysis task, scene identification is performed to obtain the document analysis scene.
[0037] Optionally, the document analysis model may be multiple document analysis models, and the step of analyzing the document to be analyzed using the document analysis model to obtain document analysis results includes:
[0038] A combined analysis strategy is obtained by combining the multiple document analysis models.
[0039] Based on the combined analysis strategy, the document to be analyzed is analyzed using the multiple document analysis models to obtain document analysis results.
[0040] Optionally, after analyzing the document to be analyzed using the document analysis model and obtaining the document analysis results, the method further includes:
[0041] Receive user feedback on the analysis results based on the document analysis results;
[0042] The matching relationship is evaluated and adjusted based on the analysis results, and the model library is updated based on the adjusted matching relationship.
[0043] Optionally, after receiving the user's evaluation of the analysis results based on the document analysis results, the method further includes:
[0044] If the analysis result is evaluated as a negative feedback evaluation, then other document analysis models that match the document analysis scenario will be searched again in the model library;
[0045] A model switching component is generated based on the other document analysis models, and the model switching component is displayed.
[0046] Receive the model switching instruction from the user based on the model switching component, and switch the document analysis model according to the model switching instruction;
[0047] The document to be analyzed is re-analyzed by switching to the new document analysis model to obtain the document re-analysis results.
[0048] Furthermore, to achieve the above objectives, this application also proposes a document analysis apparatus, which includes:
[0049] The scenario determination module is used to determine the document analysis scenario based on the document to be analyzed in response to the document analysis request.
[0050] The model lookup module is used to search for document analysis models that match the document analysis scenario in the model library, wherein the model library includes matching relationships between various analysis scenarios and various document analysis models;
[0051] The document analysis module is used to analyze the document to be analyzed using the document analysis model to obtain document analysis results.
[0052] Optionally, the document analysis device further includes:
[0053] The model library construction module is used to respond to model comparison requests by selecting multiple document analysis models from the model integration library; obtaining the analysis performance parameters of various document analysis models under various document analysis scenarios; determining the matching relationship between various document analysis scenarios and various document analysis models based on the analysis performance parameters; and constructing the model library based on the matching relationship between various document analysis scenarios and various document analysis models.
[0054] Optionally, the model library construction module is further configured to analyze various document analysis scenarios to obtain scenario features of various document analysis scenarios; obtain and / or generate representative documents corresponding to various document analysis scenarios based on the scenario features; and analyze the representative documents through the multiple document analysis models to obtain the analysis performance parameters of various document analysis models under various document analysis scenarios.
[0055] Optionally, the document analysis device further includes:
[0056] The document processing module is used to preprocess and parse the representative document to obtain the processed document;
[0057] The model library construction module is also used to analyze the processed document through the various document analysis models to obtain the analysis performance parameters of various document analysis models in various document analysis scenarios.
[0058] Optionally, the model library construction module is further configured to analyze the representative document through the multiple document analysis models to obtain multiple analysis performance parameters of the various document analysis models in various document analysis scenarios; determine the weight values of each analysis performance parameter according to the document analysis scenario; and calculate the analysis performance parameters of the various document analysis models in various document analysis scenarios based on the multiple analysis performance parameters and the weight values of each analysis performance parameter.
[0059] Optionally, the model library construction module is further configured to generate and display comparison charts of the analysis effects of various document analysis models based on the analysis performance parameters; receive matching operations from users based on the comparison charts; and determine the matching relationships between various document analysis scenarios and various document analysis models based on the matching operations.
[0060] In addition, to achieve the above objectives, this application also proposes a document analysis device, which includes a memory, a processor, and a document analysis program stored in the memory and executable on the processor, the document analysis program being configured to implement the document analysis method as described above.
[0061] In addition, to achieve the above objectives, this application also proposes a storage medium storing a document analysis program, which, when executed by a processor, implements the document analysis method as described above.
[0062] In addition, to achieve the above objectives, this application also provides a computer program product, which includes a document analysis program that, when executed by a processor, implements the document analysis method as described above.
[0063] One or more technical solutions proposed in this application have at least the following technical effects:
[0064] This application discloses a method for responding to a document analysis request, determining a document analysis scenario based on the document to be analyzed, and searching a model in a model library that matches the document analysis scenario. The model library includes matching relationships between various analysis scenarios and various document analysis models. The document to be analyzed is then analyzed using the document analysis model to obtain document analysis results. Because this application dynamically recommends document analysis models that match the document analysis scenario based on the matching relationships between various analysis scenarios and various document analysis models, it can ensure a high degree of matching between the document analysis model and the document analysis scenario, meeting the needs of multiple scenarios, thereby improving document analysis accuracy and enhancing user experience. Attached Figure Description
[0065] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this application and, together with the description, serve to explain the principles of this application.
[0066] To more clearly illustrate the technical solutions in the embodiments of this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those skilled in the art, other drawings can be obtained based on these drawings without creative effort.
[0067] Figure 1 This is a flowchart illustrating the first embodiment of the document analysis method of this application;
[0068] Figure 2 This is a flowchart illustrating the second embodiment of the document analysis method of this application;
[0069] Figure 3 This is a user interface diagram of an embodiment of the document analysis method of this application;
[0070] Figure 4 This is a flowchart illustrating the third embodiment of the document analysis method of this application;
[0071] Figure 5 This is a schematic diagram of the module structure of the document analysis device according to an embodiment of this application;
[0072] Figure 6 This is a schematic diagram of the device structure of the hardware operating environment involved in the document analysis method in this application embodiment.
[0073] The realization of the purpose, functional features and advantages of this application will be further explained in conjunction with the embodiments and with reference to the accompanying drawings. Detailed Implementation
[0074] It should be understood that the specific embodiments described herein are merely illustrative of the technical solutions of this application and are not intended to limit this application.
[0075] To better understand the technical solution of this application, a detailed description will be provided below in conjunction with the accompanying drawings and specific implementation methods.
[0076] Currently, with the rapid development of natural language processing technology, document analysis technology has been widely applied in text classification, information extraction, semantic understanding, machine translation, and other fields. At the same time, with the diversification of deep learning models, the performance differences between different models in specific document scenarios are becoming increasingly significant. For example, traditional rule-based models outperform some deep learning-based models in structured documents, while Transformer-based models have significant advantages in semantic understanding and long document processing. Therefore, ensuring the adaptability of document analysis models to the documents being analyzed is a crucial technical problem that needs to be solved.
[0077] Therefore, to overcome the above-mentioned shortcomings, this application provides a solution comprising: responding to a document analysis request, determining a document analysis scenario based on the document to be analyzed, searching a document analysis model matching the document analysis scenario in a model library, wherein the model library includes matching relationships between various analysis scenarios and various document analysis models, and analyzing the document to be analyzed using the document analysis model to obtain document analysis results; since this application dynamically recommends document analysis models matching the document analysis scenario based on the matching relationships between various analysis scenarios and various document analysis models, it can ensure a high degree of matching between the document analysis model and the document analysis scenario, meet the needs of multiple scenarios, and thus improve the accuracy of document analysis and enhance the user experience.
[0078] It should be noted that the execution subject of this embodiment may be a document analysis device with data processing, network communication and program running functions, such as a computer, or other electronic devices that can achieve the same or similar functions. This embodiment does not limit this.
[0079] Based on this, embodiments of this application provide a document analysis method, referring to... Figure 1 , Figure 1 This is a flowchart illustrating the first embodiment of the document analysis method of this application.
[0080] In the first embodiment, the document analysis method includes:
[0081] Step S10: In response to the document analysis request, determine the document analysis scenario based on the document to be analyzed.
[0082] It should be understood that, in order to ensure that the system can process user requests in a timely manner and prepare subsequent analysis steps based on the information in the request, in this embodiment, when a user submits a document analysis request, the system responds by parsing the document analysis request to obtain the document to be analyzed. Here, a document analysis request can refer to a user-initiated request to parse, understand, or extract information from a specific document. The document analysis request may include the document to be analyzed, the analysis task (such as sentiment analysis, entity recognition, topic classification, etc.), and other relevant parameters; this embodiment does not impose any limitations on these.
[0083] Understandably, to provide a foundation for selecting a suitable document analysis model and ensure that the model can effectively analyze the characteristics of the document, this embodiment determines the document analysis scenario based on the document to be analyzed. In specific implementations, determining the document analysis scenario based on the document to be analyzed can be achieved by analyzing the document's format, content, structure, and other features to determine the document analysis scenario to which the document belongs. The document analysis scenario can be an analysis and processing scenario determined based on the document's type, content, structure, and other characteristics, such as sentiment analysis, entity recognition, or topic classification. The document to be analyzed can be a user-submitted document that requires analysis and processing, and can be in various formats such as text files, PDFs, and Word documents.
[0084] Step S20: Search the model library for a document analysis model that matches the document analysis scenario, wherein the model library includes matching relationships between various analysis scenarios and various document analysis models.
[0085] It should be understood that, in order to quickly and accurately find a suitable model for a document analysis scenario, this embodiment searches for a document analysis model that matches the scenario in a model library. The model library stores various models and their applicable scenarios. By comparing the matching relationships between various analysis scenarios and various document analysis models, the document analysis model that matches the scenario can be determined. The document analysis model can refer to an algorithm or program used to process document data and generate analysis results; it can be a traditional rule-based model, a machine learning model, or a deep learning model, etc., and this embodiment does not impose any limitations on this.
[0086] Step S30: Analyze the document to be analyzed using the document analysis model to obtain the document analysis results.
[0087] Understandably, in order to generate accurate and useful document analysis results and meet user needs, this embodiment analyzes the document to be analyzed using a document analysis model that matches the document analysis scenario to obtain document analysis results. In specific implementation, after determining the document analysis model, the document analysis model is applied to the document to be analyzed to perform specific analysis processing (such as entity recognition, sentiment analysis, etc.) to obtain document analysis results.
[0088] For ease of understanding, the following example is provided, but it does not limit this application. As an example, suppose a user submits a document analysis request, hoping to analyze the topic of a news release. The document analysis method steps include:
[0089] 1. The user submits a document analysis request through the user interface or API interface, specifying that they want to analyze the topic of a press release. After receiving this document analysis request, the document analysis device will parse the request and determine that the analysis task is "press release topic analysis". Based on the analysis task, the document analysis scenario is determined to be "press release topic analysis".
[0090] 2. The document analysis device maintains an internal model library containing various types of document analysis models, such as text classification models, entity recognition models, and sentiment analysis models. Each model is labeled with its applicable document analysis scenario. After determining the scenario of "press release topic analysis," the model library is searched for a model matching this scenario. Assuming the model library contains a deep learning model specifically designed for press release topic analysis, trained on a large number of press releases, it can accurately identify the topic of press releases. Once this matching model is found, it will be prepared for use in subsequent analysis tasks.
[0091] 3. Input the press release to be analyzed into the found deep learning model. The model will parse and process the content of the press release, using its learned knowledge and algorithms to identify the topic of the press release. After a series of calculations and analyses, the model will finally generate one or more topic tags as the topic analysis results of the press release. These topic tags will be displayed to the user in a user-friendly way, such as through the user interface or in the API call results returned to the user.
[0092] Furthermore, to improve the accuracy and comprehensiveness of document analysis and meet the analysis needs in complex scenarios, in this embodiment, the document analysis model comprises multiple document analysis models. Step S30 includes: combining the multiple document analysis models to obtain a combined analysis strategy; and analyzing the document to be analyzed using the multiple document analysis models based on the combined analysis strategy to obtain document analysis results. The combined analysis strategy can refer to a strategy that combines the characteristics and advantages of multiple document analysis models to obtain more accurate and comprehensive analysis results. In specific implementations, in some complex scenarios, it is necessary to combine the advantages of multiple document analysis models to obtain more accurate analysis results. In this embodiment, multiple document analysis models matching the document analysis scenario are combined to obtain a combined analysis strategy. According to the combined analysis strategy, the document to be analyzed is input into multiple document analysis models for parallel or serial processing, and finally, the output results of each model are summarized to form the document analysis results.
[0093] Furthermore, to improve the accuracy of the matching relationship, this embodiment also adjusts the matching relationship based on the user's feedback on the analysis results. After step S30, the method further includes: receiving the user's feedback on the analysis results based on the document analysis results; adjusting the matching relationship based on the analysis results feedback; and updating the model library based on the adjusted matching relationship. Here, the analysis results feedback can refer to the user's evaluation of the document analysis results, used to assess the model's effectiveness and applicability.
[0094] In the specific implementation, after obtaining the document analysis results, users can evaluate the results, such as by rating or commenting, to collect user feedback on the model's effectiveness. Based on the user feedback on the analysis results, the matching relationships in the model library are adjusted, such as updating the applicable scenarios of the models or adding new models. This embodiment does not impose any limitations on this.
[0095] Furthermore, to provide multiple model options and improve user experience, after step S30, the method further includes: if the analysis result evaluation is negative feedback, searching the model library for other document analysis models that match the document analysis scenario; generating a model switching component based on the other document analysis models and displaying the model switching component; receiving a model switching instruction from the user based on the model switching component, and switching the document analysis model according to the model switching instruction; re-analyzing the document to be analyzed using the switched document analysis model to obtain the document re-analysis result. Here, the analysis result evaluation can refer to the user's subjective or objective evaluation of the document analysis result, which can be positive feedback (e.g., satisfied, accurate) or negative feedback (e.g., dissatisfied, inaccurate). The model switching component can refer to a user interface element that allows the user to select and switch to other document analysis models in the model library based on the analysis result evaluation.
[0096] In the implementation, when a user is dissatisfied with the analysis results (i.e., negative feedback), the system searches the model library for other document analysis models that match the current analysis scenario. Based on the found other document analysis models, a model switching component (such as a drop-down menu or button) is generated and displayed on the user interface. The system receives model switching commands from the user via the model switching component and switches to the user-selected model accordingly. The switched document analysis model is then used to re-analyze the document and generate new analysis results.
[0097] This embodiment dynamically recommends document analysis models that match the document analysis scenarios based on the matching relationship between various analysis scenarios and various document analysis models. This ensures a high degree of matching between the document analysis models and the document analysis scenarios, meets the needs of multiple scenarios, and thus improves the accuracy of document analysis and enhances the user experience.
[0098] Reference Figure 2 , Figure 2 This is a flowchart illustrating the second embodiment of the document analysis method of this application, based on the above. Figure 1 The first embodiment shown illustrates a second embodiment of the document analysis method of this application.
[0099] In the second embodiment, before step S10, the method further includes:
[0100] Step S01: In response to the model comparison request, select multiple document analysis models from the model integration library.
[0101] It should be understood that, in order to quickly compare the effects among multiple models and avoid relying on experience and trial-and-error in model selection, thereby improving the efficiency and scientific rigor of model selection, this embodiment analyzes the matching relationships between various document analysis scenarios and various document analysis models through a unified evaluation framework and standardized indicators, and constructs a model library based on these matching relationships.
[0102] Understandably, to ensure the diversity and comprehensiveness of the comparison and provide users with a wide range of model choices, this embodiment selects various document analysis models of different types from the model integration library upon receiving a model comparison request, preparing for performance comparison. The model comparison request can be a request initiated by a user or the system requesting a performance comparison of multiple document analysis models. The model integration library can be a repository that stores and manages various document analysis models, which may include traditional rule-based models, machine learning models, deep learning models, etc., and this embodiment does not impose any limitations on this.
[0103] Step S02: Obtain the analysis performance parameters of various document analysis models in various document analysis scenarios.
[0104] It should be understood that, in order to provide data support for building the model library and ensure the accuracy of matching relationships, this embodiment acquires the analysis performance parameters of various document analysis models under various document analysis scenarios. In specific implementations, for example, performance data of various models under different document analysis scenarios are collected through actual tests or historical data. The analysis performance parameters can refer to data indicators that measure the performance of a document analysis model in a specific scenario, such as accuracy, recall, F1 score, processing speed, etc., and this embodiment does not impose any limitations on this.
[0105] Furthermore, in order to provide data support for building the model library and ensure the accuracy of matching relationships, in this embodiment, representative documents of various document analysis scenarios are analyzed to obtain the analysis performance parameters of various document analysis models in various document analysis scenarios. Step S02 includes: analyzing various document analysis scenarios to obtain scenario features of various document analysis scenarios; obtaining and / or generating representative documents corresponding to various document analysis scenarios based on the scenario features; and analyzing the representative documents through the multiple document analysis models to obtain the analysis performance parameters of various document analysis models in various document analysis scenarios.
[0106] In practical implementation, by studying different document analysis scenarios, feature information for each scenario is extracted, such as document structural complexity and content theme. Based on the extracted scenario features, representative documents that can represent the scenario are selected or generated from the existing document library. These representative documents are input into various document analysis models, and the model outputs are collected and analyzed to obtain analysis performance parameters. Scenario features can refer to information describing the characteristics of a document analysis scenario, such as document type, structural complexity, and content theme. Representative documents can refer to document samples that can embody the characteristics of a specific document analysis scenario.
[0107] Furthermore, in order to perform structured data extraction and standardization on the documents and provide a unified input for subsequent analysis, before analyzing the representative document through the multiple document analysis models to obtain the analysis performance parameters of each document analysis model under various document analysis scenarios, the method further includes: preprocessing and parsing the representative document to obtain a processed document; the step of analyzing the representative document through the multiple document analysis models to obtain the analysis performance parameters of each document analysis model under various document analysis scenarios includes: analyzing the processed document through the multiple document analysis models to obtain the analysis performance parameters of each document analysis model under various document analysis scenarios.
[0108] It should be understood that, in order to improve document quality and provide accurate and standardized data input for subsequent analysis, this embodiment preprocesses and parses the representative document before analyzing it, obtaining a processed document. In specific implementations, for example, the representative document is first cleaned, formatted, and structured, such as removing irrelevant characters, extracting key information, and constructing a document structure tree. Then, the processed representative document is input into various document analysis models, and the model outputs are collected and analyzed to obtain analysis performance parameters.
[0109] Furthermore, to ensure that the evaluation of model performance highlights key aspects and improves the accuracy of the evaluation, the step of analyzing the representative document using the multiple document analysis models to obtain the analysis performance parameters of each document analysis model under various document analysis scenarios includes: analyzing the representative document using the multiple document analysis models to obtain multiple analysis performance parameters of each document analysis model under various document analysis scenarios; determining the weight values of each analysis performance parameter according to the document analysis scenario; and calculating the analysis performance parameters of each document analysis model under various document analysis scenarios based on the multiple analysis performance parameters and the weight values of each analysis performance parameter.
[0110] In practical implementation, for example, representative documents are input into various document analysis models, and the model outputs are collected and analyzed to obtain various performance parameters, such as precision, recall, F1 score, and processing speed. Based on the importance and requirements of the document analysis scenario, the weight values of each performance parameter are determined manually or through machine learning algorithms. Using methods such as weighted averaging, the comprehensive performance parameters of each model in each scenario are calculated based on multiple performance parameters and their corresponding weight values. Here, the weight value refers to the relative importance assigned to each parameter according to the importance and requirements of the document analysis scenario.
[0111] Step S03: Determine the matching relationship between various document analysis scenarios and various document analysis models based on the analysis performance parameters.
[0112] It is understandable that determining the matching relationship between various document analysis scenarios and various document analysis models based on analysis performance parameters can be achieved by analyzing the collected analysis performance parameters, determining the most suitable document analysis model for each document analysis scenario, and constructing the matching relationship between various document analysis scenarios and various document analysis models based on the most suitable document analysis model for each document analysis scenario.
[0113] Furthermore, in order to intuitively demonstrate the performance differences of different models in specific scenarios and facilitate user understanding and comparison, step S03 includes: generating a comparison chart of the analysis effects of various document analysis models based on the analysis performance parameters, and displaying the comparison chart; receiving matching operations from users based on the comparison chart; and determining the matching relationship between various document analysis scenarios and various document analysis models based on the matching operations.
[0114] Understandably, in this embodiment, visualization tools are used to display the acquired analysis performance parameters in chart form, such as bar charts, line charts, or radar charts. Users interact with the interface to select or adjust the matching relationship between the document analysis scenario and the document analysis model, and then provide feedback on the operation results to the document analysis device. The document analysis device updates the matching relationship information in the model library based on the user's feedback.
[0115] Step S04: Construct a model library based on the matching relationships between various document analysis scenarios and various document analysis models.
[0116] It should be understood that, in order to provide users with convenient model selection tools and improve the efficiency and accuracy of model selection, in this embodiment, the matching relationship between various document analysis scenarios and various document analysis models is stored in the database to form a model library.
[0117] For ease of understanding, please refer to Figure 3This explanation is provided, but does not limit the scope of this application. Figure 3 This is a user interface diagram of an embodiment of the document analysis method of this application. Figure 3 In the process, users can select from a variety of document analysis models (such as...) from the model integration library using the model selection component on the user interface. Figure 3 Users select Model 1, Model 2, and Model 4 as document analysis models, upload representative documents corresponding to various document analysis scenarios through the document upload component, and display the document analysis effects of various document analysis models through the effect comparison component (e.g., ...). Figure 3 (The image shows renderings of Model 1, Model 2, and Model 4 for comparison.)
[0118] This embodiment analyzes the matching relationship between various document analysis scenarios and various document analysis models through a unified evaluation framework and standardized indicators, and constructs a model library based on the matching relationship between various document analysis scenarios and various document analysis models. This enables rapid comparison of effects among multiple models, avoids relying on experience and trial and error to select models, and thus improves the efficiency and scientific nature of model selection.
[0119] Reference Figure 4 , Figure 4 This is a flowchart illustrating the third embodiment of the document analysis method of this application. Based on the above embodiments, a third embodiment of the document analysis method of this application is proposed.
[0120] In the third embodiment, step S10 includes:
[0121] Step S101: In response to the document analysis request, parse the document analysis request to obtain the analysis task.
[0122] It should be understood that, in order to ensure the relevance and accuracy of subsequent document analysis, this embodiment performs scene recognition based on the document format and analysis task of the document to be analyzed to obtain the document analysis scene. In specific implementation, after receiving the document analysis request, the document analysis request is first parsed to extract the specific analysis task that the user wants to perform. The analysis task can be a specific analysis operation that needs to be performed based on the document analysis request, such as sentiment analysis, topic classification, entity recognition, etc.
[0123] For ease of understanding, the following example is provided, but it does not limit this application. As an example, suppose a user submits a document analysis request, hoping to analyze the topic of a press release. After parsing the document analysis request, the resulting analysis task is "press release topic analysis".
[0124] Step S102: Obtain the file extension of the document to be analyzed, and identify the document format of the document to be analyzed based on the file extension.
[0125] Understandably, to quickly and accurately identify document formats and provide a basis for subsequent scene recognition, this embodiment obtains the file extension of the document to be analyzed and identifies the document format based on the file extension. The file extension can be a string in the filename that indicates the file type or format; it can be located at the end of the filename and separated by periods (.). The document format can refer to a set of features such as the document's organizational structure, encoding method, and data representation, which determines how the document is stored, transmitted, and parsed. In specific implementation, the filename of the document to be analyzed is read, the file extension is extracted, and the document format is identified according to a preset mapping relationship or rule base.
[0126] For ease of understanding, the following example is provided, but it does not limit this application. As an example, assume that the file name of the document to be analyzed is "news.pdf", and the system recognizes the document format as PDF by the file extension ".pdf".
[0127] Step S103: Perform scene recognition based on the document format and the analysis task to obtain the document analysis scene.
[0128] It should be understood that, in order to ensure the relevance and accuracy of subsequent analysis, in this embodiment, the specific scenario for document analysis is determined by combining the document format and analysis task through a preset scene recognition algorithm or rule base.
[0129] For ease of understanding, the following examples are provided, but they do not limit this application. As an example, assuming the document format is PDF and the analysis task is "news topic classification", the document analysis scenario is identified as "PDF news release topic classification".
[0130] Furthermore, to gain a deeper understanding of the document content and improve the accuracy of document analysis scenarios, step S103 includes: identifying the document content of the document to be analyzed using a large language model to obtain the content characteristics of the document; correspondingly, step S103 includes: identifying the scenario based on the document format, the content characteristics, and the analysis task to obtain the document analysis scenario. Here, content characteristics can refer to specific attributes or features of the document content, such as language style, topic category, and included entities. In specific implementations, pre-trained large language models (such as BERT, GPT, etc.) are used to perform deep analysis of the document content, extracting features such as the document's topic, key entities, and language style. Combining the document format, content characteristics, and analysis task, the most suitable document analysis scenario is identified through rule matching or machine learning algorithms.
[0131] This embodiment identifies the document analysis scenario based on the document format and analysis task of the document to be analyzed, thereby improving the accuracy of the document analysis scenario and ensuring the relevance and accuracy of subsequent document analysis.
[0132] It should be noted that the above examples are only for understanding this application and do not constitute a limitation on the document analysis method of this application. Any simple modifications based on this technical concept are within the protection scope of this application.
[0133] This application also provides a document analysis device; please refer to [reference needed]. Figure 5 The document analysis device includes:
[0134] The scenario determination module 10 is used to determine the document analysis scenario based on the document to be analyzed in response to the document analysis request.
[0135] The model lookup module 20 is used to search for a document analysis model that matches the document analysis scenario in the model library, wherein the model library includes matching relationships between various analysis scenarios and various document analysis models;
[0136] The document analysis module 30 is used to analyze the document to be analyzed using the document analysis model to obtain document analysis results.
[0137] The document analysis apparatus provided in this application, employing the document analysis method described in the above embodiments, can solve the technical problem of ensuring the compatibility between the document analysis model and the document to be analyzed. Compared with the prior art, the beneficial effects of the document analysis apparatus provided in this application are the same as those of the document analysis method provided in the above embodiments, and other technical features in the document analysis apparatus are the same as those disclosed in the methods of the above embodiments, and will not be repeated here.
[0138] This application provides a document analysis device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, which are executed by the at least one processor to enable the at least one processor to perform the document analysis method in Embodiment 1 above.
[0139] The following is for reference. Figure 6The diagram illustrates a structural schematic of a document analysis device suitable for implementing embodiments of this application. The document analysis device in these embodiments may include, but is not limited to, mobile terminals such as mobile phones, laptops, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Portable Application Description), PMPs (Portable Media Players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and fixed terminals such as digital TVs and desktop computers. Figure 6 The document analysis device shown is merely an example and should not impose any limitations on the functionality and scope of use of the embodiments of this application.
[0140] like Figure 6 As shown, the document analysis device may include a processing unit 1001 (e.g., a central processing unit, a graphics processing unit, etc.) that can perform various appropriate actions and processes according to a program stored in ROM (Read Only Memory) 1002 or a program loaded from storage device 1003 into RAM (Random Access Memory) 1004. RAM 1004 also stores various programs and data required for the operation of the document analysis device. The processing unit 1001, ROM 1002, and RAM 1004 are interconnected via bus 1005. Input / output (I / O) interface 1006 is also connected to the bus. Typically, the following systems can be connected to I / O interface 1006: input devices 1007 including, for example, touch screens, touchpads, keyboards, mice, image sensors, microphones, accelerometers, gyroscopes, etc.; output devices 1008 including, for example, LCDs (Liquid Crystal Displays), speakers, vibrators, etc.; storage devices 1003 including, for example, magnetic tapes, hard disks, etc.; and communication devices 1009. The communication device 1009 allows the document analysis device to communicate wirelessly or wiredly with other devices to exchange data. Although the figure shows document analysis devices with various systems, it should be understood that implementing or having all of the systems shown is not required. More or fewer systems may be implemented alternatively.
[0141] Specifically, according to the embodiments disclosed in this application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments disclosed in this application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via a communication device, or installed from storage device 1003, or installed from ROM 1002. When the computer program is executed by processing device 1001, it performs the functions defined in the methods of the embodiments disclosed in this application.
[0142] The document analysis device provided in this application, employing the document analysis method in the above embodiments, can solve the technical problem of ensuring the compatibility between the document analysis model and the document to be analyzed. Compared with the prior art, the beneficial effects of the document analysis device provided in this application are the same as those of the document analysis method provided in the above embodiments, and other technical features in this document analysis device are the same as those disclosed in the previous embodiment method, and will not be repeated here.
[0143] It should be understood that the various parts disclosed in this application can be implemented using hardware, software, firmware, or a combination thereof. In the description of the above embodiments, specific features, structures, materials, or characteristics can be combined in any suitable manner in one or more embodiments or examples.
[0144] The above description is merely a specific embodiment of this application, but the scope of protection of this application is not limited thereto. Any variations or substitutions that can be easily conceived by those skilled in the art within the technical scope disclosed in this application should be included within the scope of protection of this application. Therefore, the scope of protection of this application should be determined by the scope of the claims.
[0145] This application provides a computer-readable storage medium having computer-readable program instructions (i.e., a computer program) stored thereon, the computer-readable program instructions being used to execute the document analysis method in the above embodiments.
[0146] The computer-readable storage medium provided in this application may be, for example, a USB flash drive, but is not limited to, electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections having one or more wires, portable computer disks, hard disks, RAM (Random Access Memory), ROM (Read Only Memory), EPROM (Erasable Programmable Read Only Memory), or flash memory, optical fiber, CD-ROM (CD-Read Only Memory), optical storage devices, magnetic storage devices, or any suitable combination thereof. In this embodiment, the computer-readable storage medium may be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, system, or device. The program code contained on the computer-readable storage medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (Radio Frequency), etc., or any suitable combination thereof.
[0147] The aforementioned computer-readable storage medium may be included in the document analysis device; or it may exist independently and not be assembled into the document analysis device.
[0148] The aforementioned computer-readable storage medium carries one or more programs, which, when executed by the document analysis device, cause the document analysis device to perform the aforementioned document analysis method.
[0149] Computer program code for performing the operations of this application can be written in one or more programming languages or a combination thereof, including object-oriented programming languages such as Java, Smalltalk, and C++, as well as conventional procedural programming languages such as "C" or similar programming languages. The program code can be executed entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server. In cases involving remote computers, the remote computer can be connected to the user's computer via any type of network—including LAN (Local Area Network) or WAN (Wide Area Network)—or can be connected to an external computer (e.g., via the Internet using an Internet service provider).
[0150] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in the block diagrams and / or flowcharts, and combinations of blocks in the block diagrams and / or flowcharts, can be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.
[0151] The modules described in the embodiments of this application can be implemented in software or hardware. The names of the modules do not necessarily limit the functionality of the unit itself.
[0152] The readable storage medium provided in this application is a computer-readable storage medium that stores computer-readable program instructions (i.e., a computer program) for executing the above-described document analysis method, thereby solving the technical problem of ensuring the compatibility between the document analysis model and the document to be analyzed. Compared with the prior art, the beneficial effects of the computer-readable storage medium provided in this application are the same as those of the document analysis method provided in the above embodiments, and will not be repeated here.
[0153] This application also provides a computer program product, including a computer program that, when executed by a processor, implements the document analysis method described above.
[0154] The computer program product provided in this application solves the technical problem of ensuring the compatibility between the document analysis model and the document to be analyzed. Compared with the prior art, the beneficial effects of the computer program product provided in this application are the same as those of the document analysis method provided in the above embodiments, and will not be repeated here.
[0155] The above description is only a part of the embodiments of this application and does not limit the patent scope of this application. All equivalent structural transformations made under the technical concept of this application and using the contents of the specification and drawings of this application, or direct / indirect applications in other related technical fields, are included in the patent protection scope of this application.
[0156] This application discloses A1, a document analysis method, the document analysis method comprising:
[0157] In response to a document analysis request, determine the document analysis scenario based on the document to be analyzed;
[0158] Search the model library for a document analysis model that matches the document analysis scenario, wherein the model library includes matching relationships between various analysis scenarios and various document analysis models;
[0159] The document to be analyzed is analyzed using the document analysis model to obtain document analysis results.
[0160] A2. The document analysis method as described in A1, further comprising, before determining the document analysis scenario based on the document to be analyzed in response to the document analysis request:
[0161] In response to a model comparison request, select multiple document analysis models from the model integration library;
[0162] Obtain the analysis performance parameters of various document analysis models in various document analysis scenarios;
[0163] The matching relationship between various document analysis scenarios and various document analysis models is determined based on the analysis performance parameters.
[0164] A model library is built based on the matching relationship between various document analysis scenarios and various document analysis models.
[0165] A3. The document analysis method as described in A2, wherein obtaining the analysis performance parameters of various document analysis models under various document analysis scenarios includes:
[0166] Analyze various document analysis scenarios to obtain the scenario characteristics of each scenario;
[0167] Based on the scene characteristics, obtain and / or generate representative documents corresponding to various document analysis scenarios;
[0168] By analyzing the representative document using the various document analysis models, the analytical performance parameters of each model under various document analysis scenarios are obtained.
[0169] A4. The document analysis method as described in A3, before analyzing the representative document using the multiple document analysis models to obtain the analysis performance parameters of each document analysis model under various document analysis scenarios, further includes:
[0170] The representative document is preprocessed and parsed to obtain the processed document;
[0171] Accordingly, the step of analyzing the representative document using the various document analysis models to obtain the analysis performance parameters of each document analysis model under various document analysis scenarios includes:
[0172] The processed document is analyzed using the various document analysis models to obtain the analysis performance parameters of each model in various document analysis scenarios.
[0173] A5. The document analysis method as described in A3, wherein the representative document is analyzed using the multiple document analysis models to obtain the analysis performance parameters of each document analysis model under various document analysis scenarios, including:
[0174] By analyzing the representative document using the various document analysis models, multiple analysis performance parameters of the various document analysis models under various document analysis scenarios are obtained.
[0175] The weight values of each analysis performance parameter are determined based on the document analysis scenario;
[0176] The analytical performance parameters of various document analysis models under various document analysis scenarios are calculated based on the multiple analytical performance parameters and the weight values of each analytical performance parameter.
[0177] A6. The document analysis method as described in A2, wherein determining the matching relationship between various document analysis scenarios and various document analysis models based on the analysis performance parameters includes:
[0178] Based on the analysis performance parameters, a comparison chart of the analysis effects of various document analysis models is generated and displayed.
[0179] Receive matching operations from users based on the analysis effect comparison chart;
[0180] The matching operation determines the matching relationship between various document analysis scenarios and various document analysis models.
[0181] A7. The document analysis method as described in any one of A1 to A6, wherein the step of determining the document analysis scenario based on the document to be analyzed in response to a document analysis request includes:
[0182] In response to a document analysis request, the document analysis request is parsed to obtain an analysis task;
[0183] Obtain the file extension of the document to be analyzed, and identify the document format of the document to be analyzed based on the file extension;
[0184] Based on the document format and the analysis task, scene recognition is performed to obtain the document analysis scene.
[0185] A8. The document analysis method as described in A7, before obtaining the document analysis scene by performing scene recognition based on the document format and the analysis task, further includes:
[0186] The document content of the document to be analyzed is identified by a large language model to obtain the content characteristics of the document to be analyzed;
[0187] Accordingly, the step of performing scene recognition based on the document format and the analysis task to obtain the document analysis scene includes:
[0188] Based on the document format, content characteristics, and analysis task, scene identification is performed to obtain the document analysis scene.
[0189] A9. The document analysis method as described in any one of A1 to A6, wherein the document analysis model is multiple document analysis models, and the step of analyzing the document to be analyzed through the document analysis model to obtain document analysis results includes:
[0190] A combined analysis strategy is obtained by combining the multiple document analysis models.
[0191] Based on the combined analysis strategy, the document to be analyzed is analyzed using the multiple document analysis models to obtain document analysis results.
[0192] A10. The document analysis method as described in any one of A1 to A6, further comprising, after analyzing the document to be analyzed using the document analysis model and obtaining the document analysis result:
[0193] Receive user feedback on the analysis results based on the document analysis results;
[0194] The matching relationship is evaluated and adjusted based on the analysis results, and the model library is updated based on the adjusted matching relationship.
[0195] A11. The document analysis method as described in A10, after receiving the user's evaluation of the analysis results based on the document analysis results, further includes:
[0196] If the analysis result is evaluated as a negative feedback evaluation, then other document analysis models that match the document analysis scenario will be searched again in the model library;
[0197] A model switching component is generated based on the other document analysis models, and the model switching component is displayed.
[0198] Receive the model switching instruction from the user based on the model switching component, and switch the document analysis model according to the model switching instruction;
[0199] The document to be analyzed is re-analyzed by switching to the new document analysis model to obtain the document re-analysis results.
[0200] This application also discloses B12, a document analysis device, the document analysis device comprising:
[0201] The scenario determination module is used to determine the document analysis scenario based on the document to be analyzed in response to the document analysis request.
[0202] The model lookup module is used to search for document analysis models that match the document analysis scenario in the model library, wherein the model library includes matching relationships between various analysis scenarios and various document analysis models;
[0203] The document analysis module is used to analyze the document to be analyzed using the document analysis model to obtain document analysis results.
[0204] B13. The document analysis apparatus as described in B12, further comprising:
[0205] The model library construction module is used to respond to model comparison requests by selecting multiple document analysis models from the model integration library; obtaining the analysis performance parameters of various document analysis models under various document analysis scenarios; determining the matching relationship between various document analysis scenarios and various document analysis models based on the analysis performance parameters; and constructing the model library based on the matching relationship between various document analysis scenarios and various document analysis models.
[0206] B14. The document analysis device as described in B13, wherein the model library construction module is further configured to analyze various document analysis scenarios to obtain scenario features of various document analysis scenarios; acquire and / or generate representative documents corresponding to various document analysis scenarios based on the scenario features; and analyze the representative documents through the multiple document analysis models to obtain the analysis performance parameters of various document analysis models under various document analysis scenarios.
[0207] B15. The document analysis apparatus as described in B14, further comprising:
[0208] The document processing module is used to preprocess and parse the representative document to obtain the processed document;
[0209] The model library construction module is also used to analyze the processed document through the various document analysis models to obtain the analysis performance parameters of various document analysis models in various document analysis scenarios.
[0210] B16. The document analysis device as described in B14, wherein the model library construction module is further configured to analyze the representative document through the multiple document analysis models to obtain multiple analysis performance parameters of the various document analysis models in various document analysis scenarios; determine the weight values of each analysis performance parameter according to the document analysis scenario; and calculate the analysis performance parameters of the various document analysis models in various document analysis scenarios based on the multiple analysis performance parameters and the weight values of each analysis performance parameter.
[0211] B17. In the document analysis device described in B13, the model library construction module is further configured to generate a comparison chart of the analysis effects of various document analysis models based on the analysis performance parameters, and display the comparison chart; receive matching operations from users based on the comparison chart; and determine the matching relationship between various document analysis scenarios and various document analysis models based on the matching operations.
[0212] This application also discloses C18, a document analysis device, the document analysis device comprising: a memory, a processor, and a document analysis program stored in the memory and executable on the processor, wherein the document analysis program, when executed by the processor, implements the document analysis method as described above.
[0213] This application also discloses D19, a storage medium storing a document analysis program, which, when executed by a processor, implements the document analysis method described above.
[0214] This application also discloses E20, a computer program product including a document analysis program, which, when executed by a processor, implements the document analysis method as described above.
Claims
1. A document analysis method, characterized in that, The document analysis method includes: In response to a document analysis request, determine the document analysis scenario based on the document to be analyzed; Search the model library for a document analysis model that matches the document analysis scenario, wherein the model library includes matching relationships between various analysis scenarios and various document analysis models; The document to be analyzed is analyzed using the document analysis model to obtain document analysis results.
2. The document analysis method as described in claim 1, characterized in that, Before determining the document analysis scenario based on the document to be analyzed in response to the document analysis request, the method further includes: In response to a model comparison request, select multiple document analysis models from the model integration library; Obtain the analysis performance parameters of various document analysis models in various document analysis scenarios; The matching relationship between various document analysis scenarios and various document analysis models is determined based on the analysis performance parameters. A model library is built based on the matching relationship between various document analysis scenarios and various document analysis models.
3. The document analysis method as described in claim 2, characterized in that, The acquisition of analysis performance parameters of various document analysis models in various document analysis scenarios includes: Analyze various document analysis scenarios to obtain the scenario characteristics of each scenario; Based on the scene characteristics, obtain and / or generate representative documents corresponding to various document analysis scenarios; By analyzing the representative document using the various document analysis models, the analytical performance parameters of each model under various document analysis scenarios are obtained.
4. The document analysis method as described in claim 3, characterized in that, Before analyzing the representative document using the various document analysis models to obtain the analysis performance parameters of each document analysis model under various document analysis scenarios, the method further includes: The representative document is preprocessed and parsed to obtain the processed document; Accordingly, the step of analyzing the representative document using the various document analysis models to obtain the analysis performance parameters of each document analysis model under various document analysis scenarios includes: The processed document is analyzed using the various document analysis models to obtain the analysis performance parameters of each model in various document analysis scenarios.
5. The document analysis method as described in claim 3, characterized in that, The process involves analyzing the representative document using the various document analysis models to obtain the analysis performance parameters of each model under different document analysis scenarios, including: By analyzing the representative document using the various document analysis models, multiple analysis performance parameters of the various document analysis models under various document analysis scenarios are obtained. The weight values of each analysis performance parameter are determined based on the document analysis scenario; The analytical performance parameters of various document analysis models under various document analysis scenarios are calculated based on the multiple analytical performance parameters and the weight values of each analytical performance parameter.
6. The document analysis method as described in claim 2, characterized in that, The step of determining the matching relationship between various document analysis scenarios and various document analysis models based on the analysis performance parameters includes: Based on the analysis performance parameters, a comparison chart of the analysis effects of various document analysis models is generated and displayed. Receive matching operations from users based on the analysis effect comparison chart; The matching operation determines the matching relationship between various document analysis scenarios and various document analysis models.
7. A document analysis device, characterized in that, The document analysis device includes: The scenario determination module is used to determine the document analysis scenario based on the document to be analyzed in response to the document analysis request. The model lookup module is used to search for document analysis models that match the document analysis scenario in the model library, wherein the model library includes matching relationships between various analysis scenarios and various document analysis models; The document analysis module is used to analyze the document to be analyzed using the document analysis model to obtain document analysis results.
8. A document analysis device, characterized in that, The document analysis device includes: a memory, a processor, and a document analysis program stored in the memory and executable on the processor, wherein the document analysis program, when executed by the processor, implements the document analysis method as described in any one of claims 1 to 6.
9. A storage medium, characterized in that, The storage medium stores a document analysis program, which, when executed by a processor, implements the document analysis method as described in any one of claims 1 to 6.
10. A computer program product, characterized in that, The computer program product includes a document analysis program, which, when executed by a processor, implements the document analysis method as described in any one of claims 1 to 6.