Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Short text similarity calculation method based on multi-dimensional convolution feature

A similarity calculation, short text technology, applied in the computer field, can solve the problems of destroying the information of word vectors, unable to mine the implicit information of short texts, losing the semantic features of short texts, etc., to achieve a comprehensive effect of similarity measurement.

Active Publication Date: 2019-02-01
WUHAN UNIV OF TECH
View PDF4 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional processing methods such as word frequency model-based methods cannot mine the hidden information in short texts, and topic model-based processing methods cannot accurately model the semantic similarity in text matching
In the processing method based on the convolutional neural network, since the input of the model is a text matrix converted from short text, the rows and columns of the input matrix are determined by the length of the input short text and the dimension of the word vector. The traditional convolution kernel determines The method will not only destroy the information of the word vector, but also fail to extract the context information of each word; in addition, in the pooling layer, the maximum pooling operation only retains the value with the strongest feature, thereby ignoring the rest of the important features that appear, similar to the features The degree calculation has an adverse effect; and the traditional convolution-pooling feature extraction is performed from a single granularity, and the extracted feature vector is not enough to represent the semantics of the short text. Therefore, in the process of similarity calculation, the part of the short text will be lost. Semantic features, causing the accuracy of similarity calculation to be affected

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text similarity calculation method based on multi-dimensional convolution feature
  • Short text similarity calculation method based on multi-dimensional convolution feature
  • Short text similarity calculation method based on multi-dimensional convolution feature

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0051] Ordinary convolutional neural networks have the following problems in short text processing: first, ordinary convolution kernels cannot directly extract features from short text data; second, the maximum pooling method will lose some important features and positional information between words ; Finally, the traditional convolution-pooling feature extraction is performed from a single granularity, and the extracted feature vectors are not enough to represent short text semantics. Therefore, the present invention proposes a similarity calculation method based on multi-dimensional convolution features, and constructs a multi-granularity convolutional neural network model, which uses different granularity convolution kernels to extract features from short text data, and uses K-Block-Max Two methods of pooling (K-Block-Max Pooling) ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a short text similarity calculation method based on multi-dimensional convolution characteristics, which comprises the following steps: constructing a multi-granularity convolution neural network model by using training data; Two training samples are inputted into the input layer of the multi-granularity convolution neural network model to obtain their word vector matrices.Multi-granularity convolution operation is carried out in the convolution layer to extract respective feature vectors. Using the K-Block-Max pooling and average pooling method in a pooling layer to extract quadratic eigenvectors. In the similarity calculation layer, the similarity vectors of the two training samples are obtained by using the fusion direction and distance calculation method. The similarity values of the two training samples are calculated in the whole connection layer and compared with the similarity values labeled in the training data to update the model. Two pieces of shorttext which need to be calculated similarity are input into the trained multi-granularity convolution neural network model, and the similarity value is output at all connection layers. The invention adopts different granularity convolution check short text data for feature extraction to improve accuracy.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a short text similarity calculation method based on multi-dimensional convolution features. Background technique [0002] Feature extraction based on term frequency refers to the process of selecting the feature term set that best reflects the characteristics of short texts in the initial term set, calculated according to a given feature evaluation function. Term frequency-inverse document frequency (TF-IDF) and mutual information (MI) are two commonly used word frequency feature extraction methods. The concept of information entropy (IE), derived from statistical thermodynamics, is used to measure the degree of chaos of the system. It itself is not directly used for feature extraction of text, but it is often integrated into other short text word frequency feature extraction methods. [0003] Topic model is a commonly used short text semantic feature extraction model. First,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06N3/04G06N3/08
CPCG06N3/084G06F40/30G06F40/289G06N3/045
Inventor 高曙龚磊袁蕾程刚
Owner WUHAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products