A method and device for generating information for reviewing a document

CN116151246BActive Publication Date: 2026-06-19CHINA MOBILE INFORMATION TECHNOLOGY CO LTD +1

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
CHINA MOBILE INFORMATION TECHNOLOGY CO LTD
Filing Date
2021-11-19
Publication Date
2026-06-19

Smart Images

  • Figure CN116151246B_ABST
    Figure CN116151246B_ABST
Patent Text Reader

Abstract

This application provides a method, apparatus, electronic device, and computer program product for generating official document review information, relating to the field of data processing technology. The method includes: identifying the text theme information of the main body of the document to be tested using a topic model; simplifying the document based on the text theme information and then obtaining the document vector using a document prediction model; matching the most similar historical document from a historical document database based on the document vector similarity; and generating the review information of the document to be tested based on the review information of the historical document. This application generates the review information of the document to be tested based on the review information of the most similar historical document obtained through matching. This method can accurately predict review information even with a relatively small number of documents, thus effectively improving the accuracy of review information prediction and allowing for rapid adaptation to changes in review methods.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This application relates to the field of data processing technology, specifically to a method, apparatus, electronic device, and computer program product for generating official document approval information. Background Technology

[0002] As companies increasingly adopt more refined management practices, more and more approval processes are handled through office systems, resulting in a surge in official documents requiring approval and processing at various levels. During the approval process, it's common to fill out comments and select candidates for subsequent processing, which consumes a significant amount of time for those handling the process, especially leaders or those responsible for overall coordination.

[0003] For predicting official document approval opinions and subsequent candidates, existing AI technologies for intelligent application scenarios mainly fall into two categories. One is generative models, which use deep neural networks to learn the mapping relationship between input and output text pairs end-to-end. When predicting new text, the model directly generates the output text. The other is classification models, which first convert the known output text into a classification dictionary, and then perform classification task model learning. When predicting new text, the model obtains the output text from the dictionary of the corresponding category based on the classification prediction result.

[0004] The aforementioned existing technologies require massive amounts of data to achieve good model performance, but the amount of data in official document application scenarios is clearly insufficient. Furthermore, the output text generated by generative models struggles to guarantee semantic validity, potentially leading to incomprehensible predictions in practical applications, significantly impacting user experience. If classification models are used, the classification definition can be poorly defined due to the arbitrariness of user-written text, affecting model prediction performance. In addition, frequent changes in personnel and responsibilities in official document application scenarios can cause abrupt shifts in document approval information. Existing models, which require learning from large amounts of data, need a considerable amount of data to accumulate before they can make significant adjustments, a process that is often lengthy and cannot meet the needs of real-world applications. Summary of the Invention

[0005] This application provides a method, apparatus, electronic device, and computer program product for generating official document review information, in order to solve the problem that the accuracy of the prediction results of official document review information in the prior art is low.

[0006] In a first aspect, embodiments of this application provide a method for generating official document review information, including:

[0007] The document body of the official document to be tested is input into a pre-trained topic model to obtain the text topic information output by the topic model;

[0008] After simplifying the document body of the document to be tested according to the text topic information, the simplified document to be tested is input into the pre-trained document prediction model to obtain the document vector to be tested output by the document prediction model.

[0009] Match the target historical document corresponding to the historical document vector with the highest similarity to the document vector to be tested from the pre-stored historical document database;

[0010] The review information of the document to be tested is generated based on the review information of the target historical documents;

[0011] The text topic information includes several topic words and the topic weight corresponding to each topic word; the topic model is trained based on document segmentation samples and the text topic information corresponding to the document segmentation samples; the official document prediction model is trained based on official document text samples and the official document vectors corresponding to the official document text samples.

[0012] In one embodiment, the document prediction model includes a text input layer, a feature extraction layer, and a prediction result fusion output layer;

[0013] The step of inputting the simplified document text to be tested into a pre-trained document prediction model to obtain the document vector output by the document prediction model includes:

[0014] The simplified document text to be tested is input into the pre-trained document prediction model, and the structured attributes, document title and document body of the document text to be tested are obtained through the text input layer.

[0015] The feature extraction layer extracts features from the structured attributes, the document title, and the document body to obtain attribute feature vectors, title feature vectors, and body feature vectors, respectively.

[0016] The prediction result fusion output layer fuses the attribute feature vector, the title feature vector, and the body text feature vector according to preset decision weights to obtain the document vector to be tested and output it.

[0017] In one embodiment, the step of extracting features from the structured attributes, the document title, and the document body through the feature extraction layer to obtain attribute feature vectors, title feature vectors, and body feature vectors respectively includes:

[0018] The feature extraction layer uses a one-hot encoder to extract features from the structured attributes to obtain the attribute feature vector;

[0019] The title feature vector is obtained by using the BILSTM neural network to extract features from the document title through the feature extraction layer.

[0020] The feature extraction layer uses a HAN neural network to extract features from the document text to obtain the text feature vector.

[0021] In one embodiment, the text simplification of the document body of the document to be tested based on the text topic information includes:

[0022] Based on the text topic information, keywords with topic weights less than a preset threshold in the main body of the document text to be tested are removed to obtain the simplified document text to be tested.

[0023] In one embodiment, matching the target historical document corresponding to the historical document vector with the highest similarity to the document vector to be tested from a pre-stored historical document database includes:

[0024] Calculate the cosine similarity between the vector of the document to be tested and each historical document vector in the pre-stored historical document database, and then match the historical document corresponding to the highest cosine similarity as the target historical document.

[0025] In one embodiment, the method for generating official document review information further includes:

[0026] The review information of the document to be tested is optimized and rewritten based on a pre-constructed feature database; wherein, the feature database includes one or more of the following: synonym relationship data, department mapping relationship data, personnel responsibility data, and subject word mapping data.

[0027] In one embodiment, before inputting the document body of the document to be tested into the pre-trained topic model, the method further includes:

[0028] Data preprocessing is performed on the pre-stored historical official document data; wherein, the historical official document data includes the document word segmentation sample and the official document text sample;

[0029] The data preprocessing includes:

[0030] Interference data in the historical official document data is identified and filtered out to obtain the document text sample;

[0031] Extract business-related terms from the historical official document data and construct a business domain word segmentation dictionary;

[0032] The document text sample is segmented based on the word segmentation dictionary of the business domain to obtain the document word segmentation sample;

[0033] Business support information is generated by combining personnel structure information and information on the stage of the official document.

[0034] Secondly, embodiments of this application provide an apparatus for generating official document review information, comprising:

[0035] The topic recognition module is used to input the document body of the official document to be tested into a pre-trained topic model to obtain the text topic information output by the topic model;

[0036] The vector prediction module is used to simplify the document body of the document to be tested based on the text topic information, and then input the simplified document to be tested into a pre-trained document prediction model to obtain the document vector output by the document prediction model.

[0037] The document matching module is used to match the target historical document from the pre-stored historical document database to the historical document vector that has the highest similarity to the document vector to be tested;

[0038] The approval generation module is used to generate the approval information of the document to be tested based on the approval information of the target historical documents;

[0039] The text topic information includes several topic words and the topic weight corresponding to each topic word; the topic model is trained based on document segmentation samples and the text topic information corresponding to the document segmentation samples; the official document prediction model is trained based on official document text samples and the official document vectors corresponding to the official document text samples.

[0040] Thirdly, embodiments of this application provide an electronic device, including a processor and a memory storing a computer program, wherein the processor executes the program to implement the steps of the method for generating official document review information as described in the first aspect.

[0041] Fourthly, embodiments of this application provide a computer program product, including a computer program that, when executed by a processor, implements the steps of the method for generating official document review information as described in the first aspect.

[0042] The method, apparatus, electronic device, and computer program product for generating official document review information provided in this application extract key keywords from the document text to be tested and simplify the document text according to weights. Then, the document vector of the document text to be tested is obtained using an official document prediction model. Subsequently, the most similar historical document to the document text to be tested is selected from the historical document database, and the review information of the document text to be tested is generated based on the review information of the historical document. This application embodiment changes the existing practice of directly predicting review information through a prediction model. By generating the review information of the document text to be tested based on the review information of the most similar historical document obtained by matching, the review information of the document text to be tested can be accurately predicted even with a small number of documents, thereby effectively improving the accuracy of review information prediction and being able to quickly adapt to changes in review methods. Attached Figure Description

[0043] To more clearly illustrate the technical solutions in this application or the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, the drawings described below are some embodiments of this application. For those skilled in the art, other drawings can be obtained based on these drawings without creative effort.

[0044] Figure 1 This is a flowchart illustrating the method for generating official document review information provided in an embodiment of this application;

[0045] Figure 2 This is a schematic diagram of the structure of the document prediction model provided in the embodiments of this application;

[0046] Figure 3 This is a schematic diagram of the HAN model structure provided in the embodiments of this application;

[0047] Figure 4 This is a schematic diagram of the structure of the document review information generation device provided in the embodiments of this application;

[0048] Figure 5 This is a schematic diagram of the structure of the electronic device provided in the embodiments of this application. Detailed Implementation

[0049] To make the objectives, technical solutions, and advantages of this application clearer, the technical solutions of this application will be clearly and completely described below with reference to the accompanying drawings of the embodiments. Obviously, the described embodiments are only some embodiments of this application, not all embodiments. Based on the embodiments of this application, all other embodiments obtained by those skilled in the art without creative effort are within the scope of protection of this application.

[0050] Figure 1This is a flowchart illustrating the method for generating official document review information. (Refer to...) Figure 1 This application provides a method for generating official document review information, which may include the following steps:

[0051] S1. Input the document body of the official document to be tested into the pre-trained topic model to obtain the text topic information output by the topic model;

[0052] S2. After simplifying the document body of the document to be tested according to the text topic information, the simplified document to be tested is input into a pre-trained document prediction model to obtain the document vector output by the document prediction model; further, the text simplification of the document body of the document to be tested according to the text topic information specifically involves: removing keywords with topic weights less than a preset threshold from the document body of the document to be tested according to the text topic information to obtain the simplified document to be tested.

[0053] S3. Match the target historical document from the pre-stored historical document database to the historical document vector with the highest similarity to the document vector to be tested;

[0054] S4. Generate the review information of the document to be tested based on the review information of the target historical documents;

[0055] The text topic information includes several topic words and the topic weight corresponding to each topic word; the topic model is trained based on document segmentation samples and the text topic information corresponding to the document segmentation samples; the official document prediction model is trained based on official document text samples and the official document vectors corresponding to the official document text samples.

[0056] In this embodiment, the topic model is first used to identify the topic information of the main body of the document to be tested, including extracting the keywords of the main body and the weight of each keyword; then, keywords with lower weights are removed to obtain a simplified document to be tested; the simplified document to be tested is input into the document prediction model to obtain the document vector; then, based on the document vector similarity, the historical document most similar to the document to be tested is matched from the historical document database; finally, the review information of the document to be tested is generated based on the review information of the historical document.

[0057] Existing methods for directly predicting review information require a large amount of sample data as the basis for model training in order to predict relatively accurate review information. When the amount of data is small, it is difficult to obtain good prediction results. At the same time, the output text generated by generative models is difficult to guarantee semantic validity. In practical applications, the incomprehensible prediction results may occur, which greatly affects the user experience. Furthermore, when departmental or personnel structures change, existing prediction models cannot quickly respond to and adjust to such changes, and cannot adapt to the needs of practical applications.

[0058] Therefore, the document review information generation method provided in this application embodiment no longer adopts the existing practice of directly predicting review information through a prediction model. Instead, it generates the review information of the document to be tested based on the review information of the most similar historical documents obtained by matching. This method can accurately predict the review information even when the number of documents is relatively small, thereby effectively improving the accuracy of review information prediction and enabling rapid adaptation to changes in review methods.

[0059] In one embodiment, the document prediction model includes a text input layer, a feature extraction layer, and a prediction result fusion output layer;

[0060] The step of inputting the simplified document text to be tested into a pre-trained document prediction model to obtain the document vector output by the document prediction model includes:

[0061] The simplified document text to be tested is input into the pre-trained document prediction model, and the structured attributes, document title and document body of the document text to be tested are obtained through the text input layer.

[0062] The feature extraction layer extracts features from the structured attributes, the document title, and the document body to obtain attribute feature vectors, title feature vectors, and body feature vectors, respectively.

[0063] The prediction result fusion output layer fuses the attribute feature vector, the title feature vector, and the body text feature vector according to preset decision weights to obtain the document vector to be tested and output it.

[0064] In this embodiment of the application, for office business scenarios, the input information is divided into three categories: structured attributes, title, and body text. Feature vectors are extracted from each category and then fused according to decision weights. This makes the calculated semantic vector more suitable for the intelligent document approval task application scenario, thereby further improving the accuracy of document review information prediction in this application scenario.

[0065] In one embodiment, the step of extracting features from the structured attributes, the document title, and the document body through the feature extraction layer to obtain attribute feature vectors, title feature vectors, and body feature vectors respectively includes:

[0066] The feature extraction layer uses a one-hot encoder to extract features from the structured attributes to obtain the attribute feature vector;

[0067] The title feature vector is obtained by using the BILSTM neural network to extract features from the document title through the feature extraction layer.

[0068] The feature extraction layer uses a HAN neural network to extract features from the document text to obtain the text feature vector.

[0069] In this embodiment of the application, the structured attribute fields, titles, and body information of official documents have different data feature attributes. Different feature extractors are designed according to their different characteristics. The feature vectors of official documents in multiple dimensions are obtained by using the above three feature extractors, so that the calculated document vectors are more in line with the application scenario of intelligent document approval tasks, thereby further improving the accuracy of document review information prediction in this application scenario.

[0070] In one embodiment, matching the target historical document corresponding to the historical document vector with the highest similarity to the document vector to be tested from a pre-stored historical document database includes:

[0071] Calculate the cosine similarity between the vector of the document to be tested and each historical document vector in the pre-stored historical document database, and then match the historical document corresponding to the highest cosine similarity as the target historical document.

[0072] It should be noted that when calculating similarity, since the number of words in each official document may vary greatly, texts with similar content may sometimes have a large linear distance when calculating the topic relevance. In this case, the method of using cosine similarity to judge the angle is more in line with the current application scenario. The method for generating official document review information provided in this embodiment of the invention matches historical documents by calculating cosine similarity, which can more accurately match the most similar historical documents, thereby further improving the accuracy of official document review information prediction.

[0073] In one embodiment, the method for generating official document review information further includes:

[0074] The review information of the document to be tested is optimized and rewritten based on a pre-constructed feature database; wherein, the feature database includes one or more of the following: synonym relationship data, department mapping relationship data, personnel responsibility data, and subject word mapping data.

[0075] It should be noted that in addition to storing a large number of official document texts, the official document system also stores a lot of additional information, such as company personnel lists and company organizational structure. Using this additional data can further improve the accuracy of approval information prediction, especially when the rules for document review or personnel organization are adjusted, resulting in errors in the prediction model due to its inability to update in real time.

[0076] The document review information generation method provided in this application embodiment, by constructing a feature database, integrates existing structured business knowledge data of the system into tasks such as detailed opinion tag extraction, synonym recognition, and compound word recognition, so that the relevant task algorithms have the ability to judge background knowledge, thereby further improving the accuracy of document review information prediction.

[0077] In one embodiment, before inputting the document body of the document to be tested into the pre-trained topic model, the method further includes:

[0078] Data preprocessing is performed on the pre-stored historical official document data; wherein, the historical official document data includes the document word segmentation sample and the official document text sample;

[0079] The data preprocessing includes:

[0080] Interference data in the historical official document data is identified and filtered out to obtain the document text sample;

[0081] Extract business-related terms from the historical official document data and construct a business domain word segmentation dictionary;

[0082] The document text sample is segmented based on the word segmentation dictionary of the business domain to obtain the document word segmentation sample;

[0083] Business support information is generated by combining personnel structure information and information on the stage of the official document.

[0084] It is understandable that targeted preprocessing of key information in official documents combined with system business knowledge data can make the model more sensitive and accurate in feature extraction when training with a small amount of data. The embodiments of this application effectively improve the training efficiency of the model and the accuracy of feature extraction by performing the above-mentioned data preprocessing operation on the pre-stored historical official document data, thereby further improving the accuracy of predicting official document review information in scenarios with a small amount of data.

[0085] Based on the above solution, and to facilitate a better understanding of the document review information generation method provided in this application embodiment, the following detailed explanation is provided:

[0086] In one embodiment, the present application provides an intelligent recommendation technology for document review in enterprise information systems. This technology can predict and output accurate and natural review information (including review comments and distribution candidates) even with a relatively small volume of documents, and can quickly and sensitively detect changes in document review rules.

[0087] (I) Preprocessing of Historical Document Data

[0088] Targeted preprocessing of key information in official documents combined with system business knowledge data enables the model to extract features more sensitively and accurately when trained on limited data. The specific preprocessing process includes:

[0089] (1) Noise removal

[0090] In the composition of sample data (historical official document data), the data stored in the system's database may be associated with other information, and therefore often includes distracting data such as user IDs, creation times, and special symbols. This useless data not only greatly increases the workload of text processing, but also interferes with model training, thereby reducing the accuracy of model predictions. Therefore, removing this noisy data is the primary task before processing official document texts.

[0091] (2) Establish a word segmentation dictionary for business domains

[0092] The official document system database also stores business-related terms related to document approval, such as the names of various departments and organizations within the company, job titles, project names, and personnel information. This information is very helpful for the word segmentation of the main body of the document. Therefore, it is necessary to integrate this content and create a word segmentation dictionary for this task during data preprocessing.

[0093] (3) Word segmentation of official document text

[0094] By combining a business domain-specific word segmentation dictionary, Chinese word segmentation is performed on official document texts, and the resulting word segmentation results are used to complete subsequent subject term extraction.

[0095] (4) Adding auxiliary information

[0096] The system reads departmental and personnel information in advance, and combines it with the current stage of the document to obtain the organizational or personnel information that may appear in the current stage of the approval document. This information serves as a business knowledge reference for subsequent prediction tasks (i.e., business auxiliary information is generated by combining personnel structure information and document stage information).

[0097] (II) Analysis of Keywords in Official Documents

[0098] Since the text of official documents is usually quite long, directly inputting it into the subsequent document prediction model would reduce prediction efficiency and accuracy. Therefore, this application uses the LDA topic model to extract key information (topic terms) from the official documents. The specific process is as follows: the word segmentation results from the preprocessing steps above are input into the topic model, appropriate parameters are selected to complete the training of the topic model, and then the trained topic model is used to predict topic terms, ultimately obtaining the topic terms of the official documents and the weights of each topic term. The following section will further introduce the algorithm principle of the LDA topic model and the parameter selection method of the model.

[0099] (1) LDA Topic Model

[0100] Topic models perform natural language processing statistical analysis on unstructured data in documents to identify topics within the data and create structured data from unstructured data, while also performing dimensionality reduction. An example framework used in topic models to process text in documents is the "bag-of-words" framework, where blocks of text are converted into word count vectors based on a predefined set of words called a dictionary. For example, consider the sentence "The IT department completed the project on time and within budget" and a dictionary of five words stored in the database: "budget, department, task, project, manager," whose corresponding word count vector is [1,1,0,1,0]. Similarly, word count vectors can be constructed from any set of documents containing text; in the "bag-of-words" framework, the order of words is ignored. By iteratively processing documents through such natural language statistical analysis, topic models can discover potential topics by looking for frequently occurring phrases within the same document. Furthermore, topic models provide the ability to preprocess text in documents, such as removing false punctuation, deleting infrequent and stop words, and replacing text with regular expressions using specified text patterns and user-defined replacements. Topic models can also generate a dictionary from all the words contained in the document and store the dictionary in the database.

[0101] Topic models can synthesize topics as patterns of words that typically appear together in documents, word pairs, or phrases. Topic models can identify the most common words in the vocabulary and indicate the frequency of word occurrence in documents. Strange words that do not add meaning to documents can be removed by specifying regular expressions to extract them from documents during the preprocessing stage.

[0102] The algorithm flow is as follows:

[0103] a) Generation process

[0104] For each document in the corpus, LDA defines the following generative process:

[0105] For each document, extract one topic from the topic distribution;

[0106] Select a word from the word distribution corresponding to the topics selected above;

[0107] Repeat the above process until every word in the document has been traversed.

[0108] Each document in the corpus corresponds to a multinomial distribution of T topics (given in advance through trial and error), denoted as θ. Each topic also corresponds to a multinomial distribution of V words in the vocabulary, denoted as φ.

[0109] b) Overall Process

[0110] First, let's define the meanings of some letters: document set D, topic set T.

[0111] In D, each document d is considered as a sequence of words.<w1,w2,...,wn> Let wi represent the i-th word, and let d contain n words. In LDA, this is called a word bag; in reality, the position of each word has no impact on the LDA algorithm.

[0112] All the different words involved in document D form a large set VOCABULARY (VOC). LDA takes the document set D as input and aims to train two result vectors (assuming they are clustered into k topics, and VOC contains m words in total):

[0113] For each document d in D, the probability θd corresponding to different topics<pt1,...,ptk> Here, pti represents the probability of d corresponding to the i-th topic in T. The calculation method is intuitive: pti = nti / n, where nti represents the number of words in d corresponding to the i-th topic, and n is the total number of words in d.

[0114] For each topic t in T, the probability φt of generating different words.<pw1,...,pwm> Here, pwi represents the probability that t generates the i-th word in the VOC. The calculation method is also very intuitive: pwi = Nwi / N, where Nwi represents the number of i-th words in the VOC corresponding to topict, and N represents the total number of words corresponding to topict.

[0115] The core formula is as follows:

[0116] p(w|d)=p(w|t)*p(t|d)

[0117] Intuitively, this formula uses the Topic as an intermediate layer, and it gives the probability of word w appearing in document d using the current θd and φt. Here, p(t|d) is calculated using θd, and p(w|t) is calculated using φt.

[0118] In practice, using the current θd and φt, we can calculate p(w|d) for a word in a document corresponding to any topic, and then update the topic that the word should correspond to based on these results. Then, if this update changes the topic corresponding to the word, it will in turn affect θd and φt.

[0119] c) Learning Process

[0120] At the start of the algorithm, values ​​are randomly assigned to θd and φt (for all d and t). This process is then repeated until the final converged result is the output of LDA. Detailed iterative learning process:

[0121] For the i-th word wi in a specific document ds, if we let the topic corresponding to this word be tj, the above formula can be rewritten as:

[0122] pj(wi|ds)=p(wi|tj)*p(tj|ds)

[0123] Enumerate all topics in T to obtain all pj(wi|ds), where j takes values ​​from 1 to k. Then, based on these probability values, select a topic for the i-th word wi in ds.

[0124] Then, for all w in all d in D, calculate p(w|d) once and reselect the topic, which is considered as one iteration. After n iterations, converge to the final result.

[0125] (2) Method for Selecting Parameters for Topic Model

[0126] The parameter adjustments stored in the database give the generalization level of the analysis performed by the topic model. If the parameters cause the topic model to generate too many topics, i.e. the generalization level may be too fine, the parameters can be adjusted to reduce the number of topics generated. The statistical analysis process performed by the topic model may be in an iterative manner. Since the processing may be quite extensive, the topic model can be trained on a data sample to initialize the parameters before performing a full analysis on all documents.

[0127] The initialization parameter A corresponds to the number of words in the topic. During training, the number of topics generated for each document is {B1, B2, ..., Bn}, where n is the number of documents trained. The suitable range for the number of topics is [Bmin, Bmax]. The parameter adjustment degree C is calculated as follows:

[0128]

[0129] The absolute value of the parameter adjustment degree C corresponds to the adjustment range ΔA of A, and the sign of the parameter adjustment degree C corresponds to the adjustment direction of A:

[0130] C = kΔA;

[0131] Where k is a dynamic coefficient with a default value of 1. k will change as C changes after parameter A is adjusted. For example, when the change in C is too large, k will be reduced, and when the change in C is too large, k will be increased.

[0132] (III) Feature Data Analysis

[0133] In addition to storing a large number of official document texts, the official document system also stores a lot of additional information, such as company personnel lists and company organizational structure. Using this additional data can further improve the accuracy of approval information prediction, especially when the document review rules or personnel organization are adjusted, the neural network model cannot be updated in real time, resulting in errors.

[0134] (1) Department and personnel synonym recognition (synonym relationship data)

[0135] The system database stores authentic official document review comments. Since these comments are typically short and concise, they often contain abbreviations, making synonym identification very helpful. The specific process involves segmenting the official document review comments into several detailed comment tags after removing stop words. These tags are then matched with organizational and personnel information for similarity. After manual screening, synonyms such as abbreviations or aliases of departments and personnel are identified, creating a synonym database. Finally, this database is compared with predicted approval comments, and any instances of synonyms can be simplified and rewritten.

[0136] (2) Identification of department mapping relationships (department mapping relationship data)

[0137] The system database stores departmental compound words, such as "Group Company\Party Building Department\Party Building Office". The detailed opinion tags are compared with the candidate departments for the distribution of the approval documents to identify the mapping relationship between departmental compound words (such as each department) and detailed departments, and a mapping table between the two is established.

[0138] (3) Identification of personnel and job changes (personnel job data)

[0139] The system updates personnel and job descriptions in real time from the database. If the list changes, the final output is corrected by combining the prediction results of the neural network model to avoid situations where candidates are not in the department.

[0140] (4) Word-by-word analysis (keyword mapping data)

[0141] The verbatim analysis tool provides a precise view of themes to enable automated and human-assisted semantic analysis of data, as well as the automatic generation of presentable content. It evaluates documents to create a mapping between each document and one or more themes determined by a topic model, thus creating topic-document pairs. This mapping can be stored in a database. The tool can also create a model view of the structured data for each document, called a heading. Headings allow viewing the textual data within the document and a diagram illustrating the mapping between the textual data and one or more themes.

[0142] The word-by-word analysis tool provides a standard for efficient data scanning to ensure the accuracy of the generated topics, their mappings, and the structured data created. If topic adjustments are needed, the processing performed by the word-by-word analysis tool can be paused and the process returned to the topic model for further evaluation. Each topic is evaluated across the entire document set by analyzing the n-dimensional bias of each topic-document pair. Each bias dimension can be coarse-grained, reflecting sentiment or opinion about a topic. Furthermore, bias dimensions can be labeled as unknown, which may not be considered in the results. Bias dimensions can be determined through natural language semantic analysis, which can include keyword evaluation from the text of documents associated with the topic, as well as interpretive evaluation of the text from documents associated with the topic. The analysis can include a thesaurus and antonym list for bias evaluation.

[0143] In some cases, the same adjective can be both favorable and unfavorable. One example is the word "low," where "low cost" reflects a favorable view, while "low production level" reflects an unfavorable view. Analysis can be tailored to bias indicators or phrases, or words that provide context for the topic, and can generate visualizations to show the direction of positive bias. Furthermore, semantic analysis can reveal that topic-document pair mappings may be flawed, and that text may be related to different topics through implicit analysis, in which case the mapping can be corrected. Word-by-word analysis tools can include biases in the headings for effective data scanning to ensure that every topic-document pair... The accuracy of the n-dimensional deviations found in the text is assessed; the word-by-word analysis tool then creates topic clusters based on the topics identified by the topic model, where each topic cluster is a document pair determined based on the n-dimensional deviations of each topic-document pair and the frequency of each topic in the documents identified by that topic. The topic clustering can be performed using singular value decomposition into an orthogonal dimensional model based on the discrete or continuous range of the n-dimensional deviations and / or the values ​​of each deviation dimension; the word-by-word analysis tool creates an orthogonal dimensional model from the created structured data, where the frequency aggregation of each topic and the deviations of each n-dimensional dimension can be consistent with the original documents and other dimensions to provide a Pareto chart view of the most important to least important topics;

[0144] (5) Cross-data analysis

[0145] Cross-data analytics tools perform statistical analysis on the resulting text analysis and any co-collected structured data, providing additional statistical analysis from the text analysis generated by textual analysis tools and any co-collected structured data; the analysis provided by cross-data analytics tools can support data validation. For example, in a project tracking system, structured data might show a "green" value indicating that the project is in progress, but the text in the status field might describe that resources are unavailable and work is not progressing; in another example, a particular work center might be performing significantly better than other work centers, and combined analysis of structured and textual data from a workflow system can provide insights into how to leverage the better performance of a particular work center; exemplary analytical methods could include correlation, outlier detection, Mood's median test, and chi-square test for independence.

[0146] (IV) Construction of Document Prediction Model

[0147] After successfully training the topic model, we can extract the keywords and their weights from the main body of the official document. Combined with other information from the database, we can then begin constructing and training the document prediction model. The neural network model designed in this embodiment mainly consists of a text input layer, a feature extraction layer, and a prediction result fusion output layer. The specific structure of each of these three layers will be described below. The final model structure is as follows: Figure 2 As shown, some modules have the following comments:

[0148] pretrain: pre-training; LabelEncoding: label encoding; onehotEncoding: one-hot encoding; CutWords: word segmentation; concat: feature fusion layer; cross-entropy: cross-entropy loss function.

[0149] (1) Text input layer

[0150] The input layer of a deep neural network includes structured attributes, document title, and document body, which respectively represent the features of different dimensions of the document.

[0151] The content of the document body is as follows: after the topic weights of the topic model trained in the above steps are calculated, content with low topic weights in the text body is removed. This can effectively reduce the input content of the body, reduce noise data interference, and improve prediction efficiency and accuracy. For example, the simplified body body mainly includes keywords such as "budget, department, task, project, manager".

[0152] The content of the document title is the same as the official document title stored in the database, such as "Notification Regarding the Appointment of Comrade XX".

[0153] Structured attributes are enhanced data attributes for the model, enabling the model to further learn the business characteristics of this business scenario. Specific attribute items include group department information, personnel names, and job titles.

[0154] (2) Feature extraction layer

[0155] The structured attribute fields, titles, and body information of official documents have different data characteristics, so we need to design different feature extractors for their different characteristics.

[0156] For structured attribute fields, OnehotEncoding vectorization is performed directly.

[0157] For the title text input, considering its short text characteristics and the fact that multiple keywords are some distance apart in the text but may influence each other, the BILSTM neural network component is used for feature extraction after word segmentation. An attention network component is added to the output of this component to extract the mutual influence relationship features of the separated keywords.

[0158] For the main text input, considering its long text features, a HAN neural network is embedded to extract features at both the word and sentence levels in the main text.

[0159] Using the three feature extractors mentioned above, the feature vectors of the official documents in multiple dimensions are finally obtained.

[0160] HAN model structure as follows Figure 3 As shown, there are: sentence attention module; sentence encoder module; word attention module; and word encoder module.

[0161] The HAN (Hierarchical Attention Networks) model mainly consists of a Bi-GRU network, a word / character level attention module, and a sentence level attention module.

[0162] The model first converts Chinese characters or words into corresponding vector representations. Assuming a document contains L sentences, represented as S... i , i∈[1,L]. A sentence, after word segmentation, contains K words, and the vector representation z of the i-th sentence is... ik Since k∈[1,K], the word representation vector results can be combined with their context-related information through a Bi-GRU network to obtain the output of the hidden layer. The specific calculation process is shown below.

[0163]

[0164]

[0165]

[0166] Among them, g ik This is the vectorized representation obtained after Bi-GRU.

[0167] The purpose of adding the attention mechanism after this step is to identify the word or phrase that contributes most to the meaning of a sentence. First, g ik The result u obtained by inputting into a single-layer perceptron ik As g ik The implicit representation of word importance. ik and a randomly initialized context vector U ω The similarity is used to determine the weight. Then, a normalized Attention weight matrix is ​​obtained through a softmax operation, representing the weight of the k-th character or word in sentence i. Finally, after obtaining the Attention weight matrix, the sentence vector is regarded as a weighted sum of the vectors of these characters or words. The calculation process is shown below.

[0168] u ik =tanh(W ω g ik +bω )

[0169]

[0170]

[0171] Among them, W ω With b ω These are the weight matrix and the bias matrix, respectively. ik The attention weighting factor is used to measure the importance of the k-th character or word in sentence i.

[0172] After obtaining S i After representing the sentence, we can process it using a similar method to obtain the corresponding hidden layer sentence vector G after Bi-GRU. i As shown in the following formula.

[0173]

[0174]

[0175]

[0176] Subsequently, a sentence-level context vector U was introduced. s The vector V is used to measure the importance of a sentence within the entire text, resulting in the total vector V of the official document. The calculation process is shown below.

[0177] u i =tanh(W S G i +b S )

[0178]

[0179]

[0180] Same as above, W S With b S These are the weight matrix and the bias matrix, respectively. i This is an attention weighting factor used to measure the importance of sentence i.

[0181] (3) Prediction result fusion output layer

[0182] Since the three input items have different decision weights for predicting the final review information, the model structure design does not directly fuse the three types of features. Instead, learnable weighting coefficients are added before fusion. Specifically, the three extracted features are processed independently using regularization, residual, and dropout. Multiple softmax functions are used in conjunction with a Dense layer to output the features separately. The outputs are then merged after Label Mask processing and using the cross-entropy loss function.

[0183] The model's output (i.e., the learning objective) is the detailed comments and the label vector (document vector) established by the follow-up processor. For example, if the detailed comment item A is at position 0 in the label vector and the follow-up processor B is at position 3 in the label vector, then the comment is A, and the label vector for the processor B is represented as 1001000...

[0184] (v) Calculation of the most recently similar official document

[0185] A key characteristic of official document application scenarios is the frequent changes in personnel, organization, and responsibilities, which often lead to immediate shifts in approval methods. Conventional algorithms typically require a certain amount of data on the new approval methods to accumulate and the model to be updated and trained before the prediction results can change, resulting in a slow response to data changes. The approach adopted in this application is not to directly predict approval information. Instead, it utilizes the aforementioned neural network to output a document vector matching the current approval task objective based on the document input information. Then, it calculates the similarity information between the document to be tested and historical documents, finding the most recent and similar historical document. The approval result of this historical document is used as the basis for the output (either directly outputting the historical document's approval result or adaptively adjusting it to generate the current predicted approval result). This allows the prediction to directly find the approval method of the most recent similar document, thus enabling rapid adaptation to changes in approval methods.

[0186] When performing similarity calculations, since the number of words in each document may vary greatly, texts with similar content may sometimes have a large linear distance when calculating topic relevance. In this case, using the cosine similarity method is more suitable for the current scenario.

[0187] Cosine similarity uses the cosine of the angle between two vectors in a vector space to measure the difference between the two individuals. The closer the cosine value is to 1, the closer the angle is to 0 degrees.

[0188] The cosine value between two vectors can be obtained using the Euclidean dot product formula:

[0189]

[0190] Given two attribute vectors, A and B, the cosine similarity θ is given by the dot product and the vector length, as shown below:

[0191]

[0192] In the actual scenario of this application embodiment, the document vector Word Embedding that matches the current approval task target is first used by the neural network output introduced in the previous step. In the above formula, Ai and Bi represent the document vector output by the neural network and the document vector in the historical document database, respectively. The similarity between the two can be obtained based on the cosine similarity formula. After multiple similarity calculations, the document text with the highest similarity is finally output.

[0193] It should be noted that the embodiments of this application have the following key points:

[0194] First, during the prediction process, a topic model algorithm was used to perform text dimensionality reduction, which reduced model complexity and removed some noise, thereby improving prediction efficiency and accuracy.

[0195] Second, the existing structured business knowledge data in the system is integrated into tasks such as detailed opinion tag extraction, synonym recognition, and compound word recognition, so that the algorithms of related tasks have the ability to judge background knowledge and effectively improve the accuracy of prediction.

[0196] Third, for office business scenarios, the input information is divided into three categories: structured attributes, titles, and body text. When designing the neural network model structure, different network structures are processed according to the characteristics of the data, and learnable coefficients are added to balance the weights of various attributes, so that the calculated semantic vectors are more in line with the intelligent approval task and effectively improve the accuracy of prediction.

[0197] Fourth, considering the sudden changes in approval methods brought about by changes in personnel, organization, and responsibilities in the office environment, the algorithm design does not directly predict the approval information. Instead, it first learns and outputs the vector representation of the document based on the business objectives through a deep neural network model. Then, it calculates the similarity between the document and historical documents based on the vector representation, and selects the approval opinions and candidates of the most recent and most similar document as the basis for the approval opinions and candidates of the current document. This allows it to quickly adapt to changes in the approval method.

[0198] Compared with the prior art, the embodiments of this application have the following beneficial effects:

[0199] 1. Due to the use of topic modeling for document dimensionality reduction and the integration of structured data based on existing business background knowledge into the algorithm, it is possible to better learn data features even with a small amount of data.

[0200] 2. By employing a specially designed deep neural network, the weights of input document titles, body texts, and other information can be better balanced, and the output document vectors with high business relevance can be calculated.

[0201] 3. Regarding the output results, instead of directly predicting the output review information, the review comments of the most recent and similar official document are used as the basis for the output. On the one hand, this avoids the problem of incoherent or ambiguous sentences in the predicted output review comments; on the other hand, because it follows the principle of the most recent and similar document, it can quickly respond to changes in the document approval method caused by changes in personnel, organization, responsibilities, and other business aspects.

[0202] The document approval information generation apparatus provided in the embodiments of this application is described below. The document approval information generation apparatus described below and the document approval information generation method described above can be referred to in correspondence.

[0203] Please see Figure 4 This application provides an apparatus for generating official document review information, comprising:

[0204] The topic recognition module 1 is used to input the document body of the official document to be tested into a pre-trained topic model to obtain the text topic information output by the topic model;

[0205] Vector prediction module 2 is used to simplify the document body of the document to be tested according to the text topic information, and then input the simplified document to be tested into a pre-trained document prediction model to obtain the document vector output by the document prediction model.

[0206] Document matching module 3 is used to match the target historical document from the pre-stored historical document database to the historical document vector that has the highest similarity to the document vector to be tested;

[0207] Approval generation module 4 is used to generate the approval information of the document to be tested based on the approval information of the target historical documents;

[0208] The text topic information includes several topic words and the topic weight corresponding to each topic word; the topic model is trained based on document segmentation samples and the text topic information corresponding to the document segmentation samples; the official document prediction model is trained based on official document text samples and the official document vectors corresponding to the official document text samples.

[0209] In one embodiment, the document prediction model includes a text input layer, a feature extraction layer, and a prediction result fusion output layer;

[0210] The step of inputting the simplified document text to be tested into a pre-trained document prediction model to obtain the document vector output by the document prediction model includes:

[0211] The simplified document text to be tested is input into the pre-trained document prediction model, and the structured attributes, document title and document body of the document text to be tested are obtained through the text input layer.

[0212] The feature extraction layer extracts features from the structured attributes, the document title, and the document body to obtain attribute feature vectors, title feature vectors, and body feature vectors, respectively.

[0213] The prediction result fusion output layer fuses the attribute feature vector, the title feature vector, and the body text feature vector according to preset decision weights to obtain the document vector to be tested and output it.

[0214] In one embodiment, the step of extracting features from the structured attributes, the document title, and the document body through the feature extraction layer to obtain attribute feature vectors, title feature vectors, and body feature vectors respectively includes:

[0215] The feature extraction layer uses a one-hot encoder to extract features from the structured attributes to obtain the attribute feature vector;

[0216] The title feature vector is obtained by using the BILSTM neural network to extract features from the document title through the feature extraction layer.

[0217] The feature extraction layer uses a HAN neural network to extract features from the document text to obtain the text feature vector.

[0218] In one embodiment, the text simplification of the document body of the document to be tested based on the text topic information includes:

[0219] Based on the text topic information, keywords with topic weights less than a preset threshold in the main body of the document text to be tested are removed to obtain the simplified document text to be tested.

[0220] In one embodiment, the document matching module 3 is specifically used for:

[0221] Calculate the cosine similarity between the vector of the document to be tested and each historical document vector in the pre-stored historical document database, and then match the historical document corresponding to the highest cosine similarity as the target historical document.

[0222] In one embodiment, the device for generating official document review information further includes:

[0223] The review information optimization module is used to optimize and rewrite the review information of the document to be tested based on a pre-built feature database; wherein, the feature database includes one or more of the following: synonym relationship data, department mapping relationship data, personnel responsibility data, and subject word mapping data.

[0224] In one embodiment, the device for generating official document review information further includes:

[0225] The historical data processing module is used to preprocess pre-stored historical document data; wherein, the historical document data includes the document word segmentation sample and the document text sample;

[0226] The data preprocessing includes:

[0227] Interference data in the historical official document data is identified and filtered out to obtain the document text sample;

[0228] Extract business-related terms from the historical official document data and construct a business domain word segmentation dictionary;

[0229] The document text sample is segmented based on the word segmentation dictionary of the business domain to obtain the document word segmentation sample;

[0230] Business support information is generated by combining personnel structure information and information on the stage of the official document.

[0231] It is understood that the above-described device embodiments correspond to the method embodiments of this application. The document approval information generation device provided in the embodiments of this application can implement the document approval information generation method provided in any one of the method embodiments of this application.

[0232] Figure 5 An example is a schematic diagram of the physical structure of an electronic device, such as... Figure 5As shown, the electronic device may include a processor 510, a communication interface 520, a memory 530, and a communication bus 540, wherein the processor 510, the communication interface 520, and the memory 530 communicate with each other via the communication bus 540. The processor 510 can call a computer program in the memory 530 to execute steps of a method for generating official document review information, such as: inputting the document body of the document to be tested into a pre-trained topic model to obtain text topic information output by the topic model; simplifying the document body of the document to be tested according to the text topic information, and inputting the simplified document to be tested into a pre-trained document prediction model to obtain a document vector to be tested output by the document prediction model; matching a target historical document from a pre-stored historical document database that corresponds to the historical document vector with the highest similarity to the document vector to be tested; and generating review information for the document to be tested based on the review information of the target historical document.

[0233] Furthermore, the logical instructions in the aforementioned memory 530 can be implemented as software functional units and, when sold or used as independent products, can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this application, in essence, or the part that contributes to the prior art, or a part of the technical solution, can be embodied in the form of a software product. This computer software product is stored in a storage medium and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of this application. The aforementioned storage medium includes various media capable of storing program code, such as USB flash drives, portable hard drives, read-only memory (ROM), random access memory (RAM), magnetic disks, or optical disks.

[0234] On the other hand, this application also provides a computer program product, which includes a computer program that can be stored on a non-transitory computer-readable storage medium. When the computer program is executed by a processor, the computer can perform the steps of the document review information generation method provided in the above embodiments, such as: inputting the document body of the document to be tested into a pre-trained topic model to obtain text topic information output by the topic model; simplifying the document body of the document to be tested according to the text topic information, and inputting the simplified document to be tested into a pre-trained document prediction model to obtain the document vector to be tested output by the document prediction model; matching the target historical document corresponding to the historical document vector with the highest similarity to the document vector to be tested from a pre-stored historical document database; and generating the review information of the document to be tested according to the review information of the target historical document.

[0235] On the other hand, embodiments of this application also provide a processor-readable storage medium storing a computer program for causing a processor to execute the steps of the methods provided in the above embodiments, such as: inputting the document body of the document to be tested into a pre-trained topic model to obtain text topic information output by the topic model; simplifying the document body of the document to be tested according to the text topic information, and inputting the simplified document to be tested into a pre-trained document prediction model to obtain a document vector to be tested output by the document prediction model; matching a target historical document corresponding to the historical document vector with the highest similarity to the document vector to be tested from a pre-stored historical document database; and generating the review information of the document to be tested based on the review information of the target historical document.

[0236] The processor-readable storage medium can be any available medium or data storage device that the processor can access, including but not limited to magnetic memory (e.g., floppy disk, hard disk, magnetic tape, magneto-optical disk (MO)), optical memory (e.g., CD, DVD, BD, HVD), and semiconductor memory (e.g., ROM, EPROM, EEPROM, non-volatile memory (NAND FLASH), solid-state drive (SSD)).

[0237] The device embodiments described above are merely illustrative. The units described as separate components may or may not be physically separate. The components shown as units may or may not be physical units; that is, they may be located in one place or distributed across multiple network units. Some or all of the modules can be selected to achieve the purpose of this embodiment according to actual needs. Those skilled in the art can understand and implement this without any creative effort.

[0238] Through the above description of the embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus necessary general-purpose hardware platforms, and of course, it can also be implemented by hardware. Based on this understanding, the above technical solutions, in essence or the part that contributes to the prior art, can be embodied in the form of a software product. This computer software product can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic disk, optical disk, etc., and includes several instructions to cause a computer device (which may be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments or some parts of the embodiments.

[0239] Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of this application, and are not intended to limit them. Although this application has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that modifications can still be made to the technical solutions described in the foregoing embodiments, or equivalent substitutions can be made to some of the technical features. Such modifications or substitutions do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of this application.

Claims

1. A method of generating information for processing a document, characterized by, include: The document body of the official document to be tested is input into a pre-trained topic model to obtain the text topic information output by the topic model; After simplifying the document body of the document to be tested according to the text topic information, the simplified document to be tested is input into a pre-trained document prediction model to obtain the document vector output by the document prediction model; the document prediction model includes a text input layer, a feature extraction layer, and a prediction result fusion output layer. Match the target historical document corresponding to the historical document vector with the highest similarity to the document vector to be tested from the pre-stored historical document database; The review information of the document to be tested is generated based on the review information of the target historical documents; The text topic information includes several topic words and the topic weight corresponding to each topic word; the topic model is trained based on document word segmentation samples and the text topic information corresponding to the document word segmentation samples; the official document prediction model is trained based on official document text samples and the official document vectors corresponding to the official document text samples. The step of inputting the simplified document text to be tested into a pre-trained document prediction model to obtain the document vector output by the document prediction model includes: The simplified document text to be tested is input into the pre-trained document prediction model, and the structured attributes, document title and document body of the document text to be tested are obtained through the text input layer. The structured attributes are extracted using a one-hot encoder through the feature extraction layer to obtain attribute feature vectors; The feature extraction layer uses a BILSTM neural network to extract features from the document title to obtain a title feature vector; The feature extraction layer uses a HAN neural network to extract features from the document text to obtain a text feature vector; The prediction result fusion output layer fuses the attribute feature vector, the title feature vector, and the body text feature vector according to preset decision weights to obtain the document vector to be tested and output it.

2. The method of claim 1, wherein The step of simplifying the document body of the document to be tested based on the text topic information includes: Based on the text topic information, keywords with topic weights less than a preset threshold in the main body of the document text to be tested are removed to obtain the simplified document text to be tested.

3. The method of claim 1, wherein The step of matching the target historical document corresponding to the historical document vector with the highest similarity to the document vector to be tested from the pre-stored historical document database includes: Calculate the cosine similarity between the vector of the document to be tested and each historical document vector in the pre-stored historical document database, and then match the historical document corresponding to the highest cosine similarity as the target historical document.

4. The method of claim 1, wherein Also includes: The review information of the document to be tested is optimized and rewritten based on a pre-constructed feature database; wherein, the feature database includes one or more of the following: synonym relationship data, department mapping relationship data, personnel responsibility data, and subject word mapping data.

5. The method of claim 1, wherein, Before inputting the document body of the official document to be tested into the pre-trained topic model, the method further includes: Data preprocessing is performed on the pre-stored historical official document data; wherein, the historical official document data includes the document word segmentation sample and the official document text sample; The data preprocessing includes: Interference data in the historical official document data is identified and filtered out to obtain the document text sample; Extract business-related terms from the historical official document data and construct a business domain word segmentation dictionary; The document text sample is segmented based on the word segmentation dictionary of the business domain to obtain the document word segmentation sample; Business support information is generated by combining personnel structure information and information on the stage of the official document.

6. A document processing information generating apparatus characterized by comprising: include: The topic recognition module is used to input the document body of the official document to be tested into a pre-trained topic model to obtain the text topic information output by the topic model; The vector prediction module is used to simplify the document body of the document to be tested based on the text topic information, and then input the simplified document to be tested into a pre-trained document prediction model to obtain the document vector output by the document prediction model; the document prediction model includes a text input layer, a feature extraction layer, and a prediction result fusion output layer. The document matching module is used to match the target historical document from the pre-stored historical document database to the historical document vector that has the highest similarity to the document vector to be tested; The approval generation module is used to generate the approval information of the document to be tested based on the approval information of the target historical documents; The text topic information includes several topic words and the topic weight corresponding to each topic word; the topic model is trained based on document word segmentation samples and the text topic information corresponding to the document word segmentation samples; the official document prediction model is trained based on official document text samples and the official document vectors corresponding to the official document text samples. The vector prediction module is further configured to input the simplified document text to be tested into a pre-trained document prediction model. The text input layer obtains the structured attributes, document title, and document body of the document text. The feature extraction layer uses a one-hot encoder to extract features from the structured attributes to obtain attribute feature vectors. The feature extraction layer uses a BILSTM neural network to extract features from the document title to obtain a title feature vector. The feature extraction layer uses a HAN neural network to extract features from the document body to obtain a body feature vector. The prediction result fusion output layer fuses the attribute feature vectors, the title feature vector, and the body feature vector according to preset decision weights to obtain the document vector to be tested and outputs it.

7. An electronic device comprising a processor and a memory having a computer program stored therein, characterized in that When the processor executes the computer program, it implements the steps of the method for generating official document review information as described in any one of claims 1 to 5.

8. A computer program product, comprising a computer program, characterized in that, When the computer program is executed by the processor, it implements the steps of the method for generating official document review information as described in any one of claims 1 to 5.