Semantic text similarity calculation method based on attention

A text similarity and calculation method technology, applied in the field of natural language processing, can solve problems such as missing information, fast calculation speed, ignoring semantic information of words, etc., and achieve the effect of enhancing important information

Active Publication Date:
View PDF7 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] (1) The semantic text similarity calculation method based on literal matching, the typical semantic similarity calculation method based on TF-IDF, the semantic text similarity calculation method based on TF-IDF is to model the text into a word frequency vector, using Cosine similarity is used to measure the similarity between texts; the advantage of this method is fast calculation speed and small workload, but the disadvantage is that it ignores the semantic information of words and needs to manually set the stop vocabulary
[0005] (2) The probabilistic topic semantic text similarity calculation method based on latent semantic analysis, the typical semantic text similarity calculation model based on LDA (Latent Dirichlet Allocation), the main idea is to use the common information in the words to construct the topic of the text model, dig out the potential semantic information in the text, and thus calculate the semantic similarity between the texts; the advantage of this type of method is that it takes into account the deep semantic inf...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semantic text similarity calculation method based on attention
  • Semantic text similarity calculation method based on attention
  • Semantic text similarity calculation method based on attention

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] In order to describe the present invention more specifically, the technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0031] Such as figure 1 As shown, the semantic text similarity calculation method of the present invention comprises the following steps:

[0032] (1) Data preprocessing.

[0033] Each pair of texts (text a and text b) in the data set needs to be preprocessed and semantic text similarity calculations are performed.

[0034] This embodiment uses the Jieba word segmentation tool to carry out word segmentation operations on the text in the data set, and remove stop words; filling); for each 50-word text, initialize the word embedding based on the pre-trained 300-dimensional word vector to obtain a 50×300 word vector matrix.

[0035] (2) Build a convolutional neural network and train it.

[0036] Such as figure 2 As shown, the neural network is composed...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a semantic text similarity calculation method based on attention. The method comprises the steps: 1, carrying out preprocessing on each text pair in a data set so as to obtaincorresponding text data samples; 2, dividing all the samples into a training set and a verification set, establishing a neural network, and training the neural network by using the samples in the training set to obtain a network model for semantic text similarity calculation; and 3, preprocessing a text pair to be calculated, and inputting the preprocessed text pair into the network model to obtain a semantic similarity calculation result of the text pair. The neural network designed by the invention can more effectively extract semantic information of texts and extract finer-grained interaction information between two texts; and the neural network uses a plurality of attention mechanisms to enhance important information in interaction information between two texts and improve semantic text similarity calculation accuracy.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, and in particular relates to an attention-based semantic text similarity calculation method. Background technique [0002] Semantic text similarity calculation is an important research topic in the field of natural language processing. Semantic text similarity aims to calculate the degree of semantic equivalence between two sentences / texts. It can be applied to many tasks, such as machine translation, paraphrase problems, Automatic question answering, text classification, information retrieval, etc. [0003] At present, there have been many research results in the calculation of semantic text similarity, which can be summarized into the following three categories: [0004] (1) The semantic text similarity calculation method based on literal matching, the typical semantic similarity calculation method based on TF-IDF, the semantic text similarity calculation method based on TF...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/30G06K9/62G06N3/04G06F40/194G06F40/151
CPCG06F40/30G06F40/194G06F40/151G06N3/045G06F18/214
Inventor 张华熊张豪
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products