Semantic similarity feature extraction method based on double selection gates

A technology of semantic similarity and feature extraction, applied in semantic analysis, natural language data processing, instruments, etc., can solve the problems that users cannot quickly find keyword-related information, unsatisfactory search result quality, and information error matching, etc. Achieve the effect of solving the problem of network gradient disappearance and explosion, avoiding the influence of semantic similarity judgment, and improving matching efficiency

Pending Publication Date: 2020-02-07
GUILIN UNIV OF ELECTRONIC TECH
View PDF7 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although using the traditional string method to judge the similarity of sentence pairs helps people filter out some irrelevant information when searching for related questions to a certain extent, the quality of the search results is still unsatisfactory.
Because judging the similarity between sentences through strings is only to calculate the distance between words at the word level, there is no contextual semantic information, resulting in incorrect information matching and ambiguity, and end users cannot quickly find relevant information about keywords

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semantic similarity feature extraction method based on double selection gates
  • Semantic similarity feature extraction method based on double selection gates
  • Semantic similarity feature extraction method based on double selection gates

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0049] see figure 1 , the present invention provides a method for extracting semantic similarity features based on double selection gates, comprising:

[0050] S100. Perform word segmentation processing on P and Q of the sentence to be processed, and perform vectorized representation on the words after word segmentation processing to obtain word vectors.

[0051] The word segmentation processing in step S100 is the process of dividing the words in the sentence into reasonable word sequences that conform to the contextual meaning. It is one of the key technologies and difficulties in natural language understanding and text information processing, and it is also a part of the semantic similarity model. an important processing link. The problem of word segmentation in Chinese is more complicated. The reason is that there is no obvious mark between words. The use of words is flexible, varied, and has rich semantics, which is prone to ambiguity. According to research, the main di...

Embodiment 2

[0074] On the basis of Embodiment 1, the first cyclic neural network is composed of a layer of unidirectional LSTM network and a layer of bidirectional LSTM network, each layer includes a plurality of connected LSTM cell modules, according to the input gate in the LSTM cell module, The forget gate, update gate and filter output gate process the current input information and the previous moment output information. The first layer of the first recurrent neural network includes a plurality of connected unidirectional LSTM cell modules for obtaining the state vector of each word. The second layer of the first recurrent neural network includes multiple connected bidirectional LSTM cell modules, which are used to obtain sentence context information vectors.

[0075] In this method, first, the words and context information of the sentence are modeled through the first recurrent neural network, and the state vector at the corresponding time of each word of the sentence and the context...

Embodiment 3

[0085] On the basis of Embodiment 1 or 2, the dual selection gate includes two selection gate structures, and the two selection gates have different structures and parameters. Through different selection gates, it is beneficial to filter out redundant information in sentences and obtain core information more accurately. The calculation formula of the first layer selection gate is as follows:

[0086] s=h n ;

[0087] sGate i =σ(W s h i +U s s+b);

[0088]

[0089]In the above formula, the sentence vector is constructed using the sentence context hidden vector, and the hidden layer h of the sentence is taken n For the sentence vector s, sGate i is the gate vector, W s and U s Is the weight matrix, b is the bias vector, σ is the sigmoid activation function, is the dot product between elements.

[0090] The second layer selection gate calculates the context vector at time t, using the sentence vector at the previous time and the hidden layer state h′ of the select...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a semantic similarity feature extraction method based on double selection gates, and relates to the field of natural language processing. The technical scheme is as follows: firstly, peforming word segmentation and vectorization representation on an input sentence pair to obtain a word vector; inputting the obtained word vector sequence into a bidirectional long-short-termmemory network; obtaining contextual information vectors of the two sentences; secondly, obtaining core feature vectors of sentence pairs through double selection gates respectively; obtaining vectorsof a sentence pair, then inputting the vectors into a multi-angle semantic feature matching network to obtain feature matching vectors of the sentence pair, finally, combining the two semantic feature matching vectors of the matching vectors through a bidirectional long-short-term memory network aggregation layer, and predicting the similarity of the sentence pair. The method effectively alleviates the problem of low matching efficiency caused by information redundancy, and avoids the cost problem of manual core information extraction.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a semantic similarity feature extraction method based on double selection gates. Background technique [0002] Today's world is flooded with massive amounts of information, most of which are stored in the form of text, and an important topic of artificial intelligence is to organize and "express" these text information, so that computers can "understand" like humans these messages. Because there are many words in the language that have multiple meanings, and the same concept can be expressed in different ways, there are many uncertain factors. The traditional text similarity calculation method based on string matching is widely used in search engines and question answering systems. It has been difficult to meet the needs of users. When users enter keywords to find information that matches the keywords, the content returned by the search may correspond to content that d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/194G06F40/30G06F40/289G06K9/62G06N3/04
CPCG06N3/044G06N3/045G06F18/22
Inventor 蔡晓东秦菲
Owner GUILIN UNIV OF ELECTRONIC TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products