[0025] Such as Figure 1-4 As shown, this specific implementation adopts the following technical solution: a method based on the matching degree between text semantic analysis requirements and output results, including the following steps:
[0026] A method based on text semantic analysis of the matching degree between requirements and output results, including the following steps:
[0027] Step 1. Data set annotation: Use text information based on project requirements description, achievement description and project title to compare and summarize the correlation and matching degree of the two projects and perform four-category annotation to construct a labeled data set for project matching calculation modeling ;
[0028] Step 2. Technical document preprocessing: Construct the Bert model input text. In view of the limitation of the Bert model input sequence length and computing resources, the Rough-L algorithm is used here based on the project name to core the project requirements and achievement description text in sentence units Information extraction, the input text of each project consists of two parts A|B, A is the project name, and B is the core information part of the project;
[0029] Step 3. Single-parameter model training and prediction: Partially improve the Bert model based on the ideas of knowledge distillation, cross-validation and integrated learning, and perform model modeling, tuning and classification label prediction based on the labeled data, and complete the original data To the task of outputting the prediction result of the single model relevance degree;
[0030] Step 4. Multi-parameter model prediction result integration: train multiple models according to the different value combinations of the "temperature" adjustment parameter T and the cross-validation fold number K involved in the improved model, and select 5 with better quality according to the performance effect The model performs multi-model probabilistic integration to obtain the final prediction result.
[0031] Among them, the specific reference figure 1 As shown, the step 1 uses the text information based on the project requirement description, the achievement description and the project title to compare and summarize the correlation and matching degree of the two projects and perform four-category labeling to construct a labeling data set for project matching calculation modeling ;
[0032] Use the technical documents of the existing project achievements in the power grid field and the scientific and technological project declaration guidelines issued every year as the initial data source. According to the sub-field of the project and the key technology types involved, the title and core content of the two parts of the text data are manually screened In summary, determine the matching relationship between the results and the requirements and perform four-category labeling (no correlation, weak correlation, strong correlation and strong correlation) to construct a two-way matching degree labeling data set.
[0033] Among them, the specific reference figure 2 As shown, the input text of the Bert model is constructed in the step 2, and based on the project name, the Rough-L algorithm is used to extract the core information of the project’s needs and achievement description text in sentence units. The input text of each project is determined by A| B is composed of two parts, A is the name of the project, and B is the core information part of the project (derived from the Rough-L algorithm);
[0034] Based on the original document data, the Rough-L algorithm is used to extract the core information of the document content in sentence units based on the document title or item name, to expand the semantic meaning and information coverage of the training set, and use this as an aid to determine the inclusion of the "result title" The data set format of double input (Sentence-1 and Sentence-2) of the four parts of the content of "Core Information of Achievement", "Demand Title" and "Core Information of Demand". Each Sentence is composed of two parts A|B, A is Project name, B is the core information part of the project achievement description or requirement description (derived by the Rough-L algorithm);
[0035] Rough-L algorithm:
[0036] Rough algorithm is a method for automatic summary evaluation, which is based on the co-occurrence information of n-grams in the summary to evaluate the summary quality. Rough-L is a kind of Rough algorithm, which is calculated based on the accuracy and recall rate of the longest common subsequence co-occurrence. It does not require continuous matching and can reflect word order matching information.
[0037]
[0038]
[0039]
[0040] among them:
[0041] X and Y represent the reference abstract and the candidate abstract, respectively. In the present invention, they represent the document title and candidate sentences;
[0042] m and n represent the length of standard abstract and automatic abstract respectively;
[0043] LCS(X,Y) represents the length of the longest common subsequence of X and Y;
[0044] R lcs And P lcs Represent the recall rate and accuracy rate respectively;
[0045] F lcs This is what we call the Rough-L indicator.
[0046] Among them, the specific reference image 3 And Figure 4 As shown, in the steps, the Bert model is partially improved based on the ideas of knowledge distillation, cross-validation, and integrated learning, and model modeling, tuning, and classification label prediction are performed based on the labeled data to complete the transformation from original data to single model The task of outputting relevance prediction results.
[0047] Based on the ideas of knowledge distillation, cross-validation and integrated learning, the Bert model is partially improved. At the same time, the GRU network layer is added to the end of the Bert model, and the K cls vectors output by the cross-validation are used as input to capture the difference between Sentence-1 and Sentence-2. Associated information. First, add the temperature parameter T to the Softmax layer according to the method of preserving the similarity between different categories used in knowledge distillation, and further improve the problem of the small amount of information between categories in the traditional Softmax function, so that the predicted data belongs to a certain category According to the project matching degree, the probability of showing a gradual change effect; then according to the idea of cross-validation and integrated learning, the task of training a single model is changed to the task of training 5 cross-validation sub-models, so as to reduce the model's over-expression on a single test data. Fitting situation: In the prediction stage, the output probability of each category is integrated and predicted by 5 sub-models according to the above-mentioned improved Softmax function, and the final prediction result is output;
[0048] Softmax layer improvement plan:
[0049]
[0050] among them:
[0051] z i Represents the predicted value of each category input by the original Softmax function;
[0052] q i Represents the predicted probability of each category output by the improved Softmax function;
[0053] T represents the "temperature" adjustment parameter introduced for the category output probability.
[0054] Among them, in the step 4, multiple models are trained according to the different values of the "temperature" adjustment parameter T and the cross-validation factor K involved in the improved model, and five models with better quality are selected according to the performance effect. Model probabilistic integration to obtain the final prediction result.
[0055] According to the above-mentioned different "temperature" adjustment parameters T and cross-validation fold number K, multiple models are trained respectively, and the item matching degree is predicted from different granularities, and then these models are screened, and the best prediction effect is selected. Probabilistic integration of 5 models, that is, equal weighted average of the predicted probabilities of each model output, and then judge the predicted data to belong to four categories (no correlation, weak correlation, strong correlation, and strong correlation) based on the average probability value In which category, complete the calculation of the matching degree between project requirements and results based only on technical documents.
[0056] The beneficial effects of the present invention:
[0057] 1. Create a new research perspective, that is, calculate the degree of association matching between projects based only on the text data of the project results technical documents and the project declaration guidelines;
[0058] 2. For the first time, deep learning and NLP technology are applied to the field of project relevance calculation of enterprise project management;
[0059] 3. Have made more meaningful explorations and attempts in model structure transformation and input data structure construction;
[0060] Under the predicament that it is impossible to in-depth project research and bidding, only relying on the existing project achievement technical documents and the text data of the project declaration guide to calculate the degree of correlation and matching between projects, and then assist large enterprises in screening high matches in the project bidding High-quality projects.
[0061] The above shows and describes the basic principles and main features of the present invention and the advantages of the present invention. Those skilled in the art should understand that the present invention is not limited by the above-mentioned embodiments. The above-mentioned embodiments and the description in the specification are only illustrative of the present invention. In principle, without departing from the spirit and scope of the present invention, the present invention will have various changes and improvements. These changes and improvements fall within the scope of the claimed invention. The scope of the claimed invention is determined by the appended rights. Definition of requirements and their equivalents.