Method for calculating similarity of short texts by using deep convolution neural network

A similarity calculation and neural network technology, applied in the field of similarity calculation between short texts, can solve the problems of incomplete basis, keywords and concepts that cannot replace all texts, and cannot fully represent the similarity between two paragraphs of text, etc., to achieve simple calculation , the effect of improving the accuracy

Active Publication Date: 2017-05-31
XI AN JIAOTONG UNIV
View PDF6 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The above-mentioned patent calculates the similarity between texts by separately calculating the semantic similarity of the keyword part and the semantic similarity of the concept part. Keywords and concepts cannot replace the entire text
Therefore, the basis for the calculation of text similarity in the above-mentioned patent is incomplete and cannot fully represent the similarity between two paragraphs of text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for calculating similarity of short texts by using deep convolution neural network
  • Method for calculating similarity of short texts by using deep convolution neural network
  • Method for calculating similarity of short texts by using deep convolution neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The present invention will be further explained below in conjunction with specific embodiments and accompanying drawings.

[0035] The present invention comprises the following steps:

[0036] (1) Express several short texts into several matrices: first, select the words that appear in all relevant pages of knowledge fields on Wikipedia as the vocabulary; then, use the open source code of word2vec released by Google on the Internet to train the vocabulary, each Each word is expressed as a vector; finally, each word in the text is replaced in turn by the corresponding word vector in the vocabulary, each word vector occupies a row, and an ordered sequence of vectors is obtained, which can be regarded as a matrix, The number of lines is the number of words;

[0037] (2) Combine several short texts in pairs, and generate a similarity matrix for the matrix of two short texts in each group: first, for two paragraphs of text, take the two matrices corresponding to them in ste...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for calculating similarity of short texts by using a deep convolution neural network, and aims to calculate the similarity of short texts by using each word in the short texts and obtain a relatively accurate value of the similarity. According to the technical scheme, the method comprises the following steps: 1) expressing a plurality of short texts as a plurality of matrixes, and sequentially replacing each word in texts by using corresponding word vectors so as to obtain an ordered vector sequence as one matrix; 2) generating a similar matrix of two matrixes representing target short texts, and arranging cosine similarity of word vectors so as to obtain a similarity matrix; 3) paving rows and columns of similar matrixes into same dimensions; and 4) reducing the dimensions of the similar matrixes into one value as the similarity, performing training dimension reduction on the similar matrixes for all similar matrixes of same dimensions by using the deep convolution neural network, and calculating a similarity degree through multi-layer sensation, thereby obtaining the value of the similarity.

Description

technical field [0001] The invention relates to a method for calculating the similarity between texts, in particular to a method for calculating the similarity between short texts through a deep convolutional neural network. Background technique [0002] With the development of community question-and-answer websites, a large number of different types of questions and answers are combined, making it difficult for users to find useful or interesting content. One of the methods to solve the above problems is to classify the questions and answers of the community Q&A system, so that users can directly search and browse in the topics they are interested in. Manually classifying these questions and answers requires them to have strong professional knowledge in the knowledge field, and it will consume considerable time and energy. Moreover, with the widespread application of community question-answering systems, the speed at which questions and answers appear gradually increases, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/30
Inventor 魏笔凡郭朝彤刘均郑庆华吴蓓郑元浩石磊吴科炜
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products