Method for analyzing matching degree between demand and output result based on text semantics

A technology of semantic analysis and matching degree, applied in the direction of semantic tool creation, text database query, unstructured text data retrieval, etc., to achieve the effect of reducing difficulty, reducing time and resource investment

Pending Publication Date: 2020-06-19
普华讯光(北京)科技有限公司
0 Cites 5 Cited by

AI-Extracted Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a method based on the matching degree between text semantic a...
View more

Abstract

The invention discloses a method for analyzing a matching degree between a demand and an output result based on text semantics. The method comprises the following steps: step 1, labeling a data set; step 2, technical document preprocessing; 3, training and predicting a single-parameter model; 4, integrating prediction results of the multi-parameter model; the method has the beneficial effects thatthe method is simple; deep learning and the NLP technology are applied to the field of project association degree calculation of enterprise project management for the first time. Calculating an association matching degree between the two projects according to project requirements and result description; the associated project positioning difficulty is effectively reduced; meanwhile, the demand side can be helped to quickly and efficiently locate high-quality projects adapting to the demand of the demand side; time and resource investment for achievement screening and matching are greatly reduced, the association matching degree between projects is calculated by means of text data of existing project achievement technical documents and project declaration guidelines, and then large enterprises are assisted in screening high-quality projects with the high matching degree in the project bidding and tendering link.

Application Domain

Neural architecturesText database querying +2

Technology Topic

Enterprise project managementData science +9

Image

  • Method for analyzing matching degree between demand and output result based on text semantics
  • Method for analyzing matching degree between demand and output result based on text semantics
  • Method for analyzing matching degree between demand and output result based on text semantics

Examples

  • Experimental program(1)

Example Embodiment

[0025] Such as Figure 1-4 As shown, this specific implementation adopts the following technical solution: a method based on the matching degree between text semantic analysis requirements and output results, including the following steps:
[0026] A method based on text semantic analysis of the matching degree between requirements and output results, including the following steps:
[0027] Step 1. Data set annotation: Use text information based on project requirements description, achievement description and project title to compare and summarize the correlation and matching degree of the two projects and perform four-category annotation to construct a labeled data set for project matching calculation modeling ;
[0028] Step 2. Technical document preprocessing: Construct the Bert model input text. In view of the limitation of the Bert model input sequence length and computing resources, the Rough-L algorithm is used here based on the project name to core the project requirements and achievement description text in sentence units Information extraction, the input text of each project consists of two parts A|B, A is the project name, and B is the core information part of the project;
[0029] Step 3. Single-parameter model training and prediction: Partially improve the Bert model based on the ideas of knowledge distillation, cross-validation and integrated learning, and perform model modeling, tuning and classification label prediction based on the labeled data, and complete the original data To the task of outputting the prediction result of the single model relevance degree;
[0030] Step 4. Multi-parameter model prediction result integration: train multiple models according to the different value combinations of the "temperature" adjustment parameter T and the cross-validation fold number K involved in the improved model, and select 5 with better quality according to the performance effect The model performs multi-model probabilistic integration to obtain the final prediction result.
[0031] Among them, the specific reference figure 1 As shown, the step 1 uses the text information based on the project requirement description, the achievement description and the project title to compare and summarize the correlation and matching degree of the two projects and perform four-category labeling to construct a labeling data set for project matching calculation modeling ;
[0032] Use the technical documents of the existing project achievements in the power grid field and the scientific and technological project declaration guidelines issued every year as the initial data source. According to the sub-field of the project and the key technology types involved, the title and core content of the two parts of the text data are manually screened In summary, determine the matching relationship between the results and the requirements and perform four-category labeling (no correlation, weak correlation, strong correlation and strong correlation) to construct a two-way matching degree labeling data set.
[0033] Among them, the specific reference figure 2 As shown, the input text of the Bert model is constructed in the step 2, and based on the project name, the Rough-L algorithm is used to extract the core information of the project’s needs and achievement description text in sentence units. The input text of each project is determined by A| B is composed of two parts, A is the name of the project, and B is the core information part of the project (derived from the Rough-L algorithm);
[0034] Based on the original document data, the Rough-L algorithm is used to extract the core information of the document content in sentence units based on the document title or item name, to expand the semantic meaning and information coverage of the training set, and use this as an aid to determine the inclusion of the "result title" The data set format of double input (Sentence-1 and Sentence-2) of the four parts of the content of "Core Information of Achievement", "Demand Title" and "Core Information of Demand". Each Sentence is composed of two parts A|B, A is Project name, B is the core information part of the project achievement description or requirement description (derived by the Rough-L algorithm);
[0035] Rough-L algorithm:
[0036] Rough algorithm is a method for automatic summary evaluation, which is based on the co-occurrence information of n-grams in the summary to evaluate the summary quality. Rough-L is a kind of Rough algorithm, which is calculated based on the accuracy and recall rate of the longest common subsequence co-occurrence. It does not require continuous matching and can reflect word order matching information.
[0037]
[0038]
[0039]
[0040] among them:
[0041] X and Y represent the reference abstract and the candidate abstract, respectively. In the present invention, they represent the document title and candidate sentences;
[0042] m and n represent the length of standard abstract and automatic abstract respectively;
[0043] LCS(X,Y) represents the length of the longest common subsequence of X and Y;
[0044] R lcs And P lcs Represent the recall rate and accuracy rate respectively;
[0045] F lcs This is what we call the Rough-L indicator.
[0046] Among them, the specific reference image 3 And Figure 4 As shown, in the steps, the Bert model is partially improved based on the ideas of knowledge distillation, cross-validation, and integrated learning, and model modeling, tuning, and classification label prediction are performed based on the labeled data to complete the transformation from original data to single model The task of outputting relevance prediction results.
[0047] Based on the ideas of knowledge distillation, cross-validation and integrated learning, the Bert model is partially improved. At the same time, the GRU network layer is added to the end of the Bert model, and the K cls vectors output by the cross-validation are used as input to capture the difference between Sentence-1 and Sentence-2. Associated information. First, add the temperature parameter T to the Softmax layer according to the method of preserving the similarity between different categories used in knowledge distillation, and further improve the problem of the small amount of information between categories in the traditional Softmax function, so that the predicted data belongs to a certain category According to the project matching degree, the probability of showing a gradual change effect; then according to the idea of ​​cross-validation and integrated learning, the task of training a single model is changed to the task of training 5 cross-validation sub-models, so as to reduce the model's over-expression on a single test data. Fitting situation: In the prediction stage, the output probability of each category is integrated and predicted by 5 sub-models according to the above-mentioned improved Softmax function, and the final prediction result is output;
[0048] Softmax layer improvement plan:
[0049]
[0050] among them:
[0051] z i Represents the predicted value of each category input by the original Softmax function;
[0052] q i Represents the predicted probability of each category output by the improved Softmax function;
[0053] T represents the "temperature" adjustment parameter introduced for the category output probability.
[0054] Among them, in the step 4, multiple models are trained according to the different values ​​of the "temperature" adjustment parameter T and the cross-validation factor K involved in the improved model, and five models with better quality are selected according to the performance effect. Model probabilistic integration to obtain the final prediction result.
[0055] According to the above-mentioned different "temperature" adjustment parameters T and cross-validation fold number K, multiple models are trained respectively, and the item matching degree is predicted from different granularities, and then these models are screened, and the best prediction effect is selected. Probabilistic integration of 5 models, that is, equal weighted average of the predicted probabilities of each model output, and then judge the predicted data to belong to four categories (no correlation, weak correlation, strong correlation, and strong correlation) based on the average probability value In which category, complete the calculation of the matching degree between project requirements and results based only on technical documents.
[0056] The beneficial effects of the present invention:
[0057] 1. Create a new research perspective, that is, calculate the degree of association matching between projects based only on the text data of the project results technical documents and the project declaration guidelines;
[0058] 2. For the first time, deep learning and NLP technology are applied to the field of project relevance calculation of enterprise project management;
[0059] 3. Have made more meaningful explorations and attempts in model structure transformation and input data structure construction;
[0060] Under the predicament that it is impossible to in-depth project research and bidding, only relying on the existing project achievement technical documents and the text data of the project declaration guide to calculate the degree of correlation and matching between projects, and then assist large enterprises in screening high matches in the project bidding High-quality projects.
[0061] The above shows and describes the basic principles and main features of the present invention and the advantages of the present invention. Those skilled in the art should understand that the present invention is not limited by the above-mentioned embodiments. The above-mentioned embodiments and the description in the specification are only illustrative of the present invention. In principle, without departing from the spirit and scope of the present invention, the present invention will have various changes and improvements. These changes and improvements fall within the scope of the claimed invention. The scope of the claimed invention is determined by the appended rights. Definition of requirements and their equivalents.

PUM

no PUM

Description & Claims & Application Information

We can also present the details of the Description, Claims and Application information to help users get a comprehensive understanding of the technical details of the patent, such as background art, summary of invention, brief description of drawings, description of embodiments, and other original content. On the other hand, users can also determine the specific scope of protection of the technology through the list of claims; as well as understand the changes in the life cycle of the technology with the presentation of the patent timeline. Login to view more.

Similar technology patents

LED display screen control system and asynchronous control card

ActiveCN104050920AEasy to maintain and expandReduce difficulty
Owner:XIAN NOVASTAR TECH

Classification and recommendation of technical efficacy words

  • Reduce difficulty
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products