Text similarity measurement method and device based on multi-model fusion

A technology of text similarity and measurement method, which is applied in the field of text similarity measurement and device based on multi-model fusion, can solve the problems of inability to infer the true meaning of documents, affecting the accuracy of similarity, and failing to consider document semantics. The learning ability is continuously improved, the recall rate and accuracy rate are improved, and the effect of avoiding artificial feature extraction

Active Publication Date: 2021-05-11
QUANZHOU POWER SUPPLY COMPANY OF STATE GRID FUJIAN ELECTRIC POWER +2
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At the same time, some improvements have been made to the final similarity score, but this part of the improvement will bring a significant increase in time complexity
This scheme is relatively rough and has obvious shortcomings: First, it does not consider the semantics of documents, ignores the contextual relationship of words, and ignores the positional relationship of words, and only judges the similarity from the level of string comparison
[0006] As mentioned above, the existing technology is a series of solutions extended from the technical route based on keyword matching or vector space. The contextual relationship of words ignores the positional relationship of words. It should be noted that Chinese is extensive and profound. It is often impossible to infer the true meaning expressed in the document only by the literal meaning, which in turn affects the accuracy of similarity judgments.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text similarity measurement method and device based on multi-model fusion
  • Text similarity measurement method and device based on multi-model fusion
  • Text similarity measurement method and device based on multi-model fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] Such as figure 1 As shown, the text similarity measurement method based on multi-model fusion includes the following steps:

[0040] A. Prepare a number of manually marked sentence pairs as a data set, and divide the data set into a training set and a test set; the number of sentence pairs in the data set is not less than 1000 pairs, and the sentences are in tsv format, and the tsv column header is: sentence1, sentence2, lable; the ratio of the training set and the test set is 7:3, and the test set is divided into an adjustment test set and a verification test set, and the ratio is 2:1;

[0041] B. Select four different deep learning training models: Bert, Paddle, Xlnet and Tree-LSTM, and set a set of initial hyperparameters for each training model to form a hyperparameter combination;

[0042] C. For each training model after step B, carry out C times of sampling with replacement from the training set, and carry out 4C times of sampling in total, and input the C times...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text similarity measurement method based on multi-model fusion. The method comprises the following steps: preparing a training set and a test set; selecting four deep learning training models: Bert, Paddle, Xlnet and Tree-LSTM; for each training model, acquiring C sub-models; for each sub-model, calculating a similarity score and a loss function of the input data; evaluating the sub-models; selecting and fixing a group of super-parameter combination with the best evaluation value of each sub-model; continuously training each sub-model to converge a loss function, and storing the 4C sub-models at the moment; fusing the 4C sub-models by adopting a Boosting scheme so as to perform weighted addition on similarity scores of the sub-models to obtain a similarity measurement model; and testing and adjusting the similarity measurement model by using the data of the test set. According to the method, the accuracy of similarity measurement is effectively improved, the recall rate and accuracy of similarity judgment are improved, and the generalization ability of the model is improved.

Description

technical field [0001] The invention relates to a method and device for measuring text similarity based on multi-model fusion. Background technique [0002] Text similarity measurement refers to the measurement of the similarity between two texts, which has a wide range of applications in many fields. For example, in information retrieval, similarity can be used to identify similar words and improve the recall rate. In the automatic question answering scenario, the similarity can be used to calculate the matching degree between the user's question sentence in natural language and the question in the corpus, and the answer corresponding to the question with the highest matching degree is returned as the most responsive. However, in the application of machine translation, the bilingual translation is completed by analyzing the similarity of sentences. Whether the similarity can be accurately defined and calculated will affect the final translation effect. At this time, the us...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/279G06F40/30G06F40/58G06N3/04
CPCG06F40/279G06F40/30G06F40/58G06N3/049G06N3/044Y02D10/00
Inventor 谢勇添颜泗海林明福林宪谢宇宸张宏坡陈圣毅
Owner QUANZHOU POWER SUPPLY COMPANY OF STATE GRID FUJIAN ELECTRIC POWER
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products