Text hash retrieval method based on deep learning

A deep learning and text technology, applied in the field of text hash retrieval, can solve the problems of inability to effectively guarantee the semantic similarity of text, increase the cost of semantic retrieval, and low efficiency of code retrieval, so as to improve query accuracy, improve expression ability, The effect of enhancing learning ability

Active Publication Date: 2020-04-03
广西白鲸信息技术有限公司
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] As the scale and dimension of data increase, the cost of semantic retrieval increases sharply. As an important way to achieve efficient semantic retrieval, text hashing has received extensive attention; however, most text hashing algorith

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The present invention is described in further detail below.

[0028] A text hash retrieval method based on deep learning, comprising the following steps:

[0029] ① Obtain the text library data to be retrieved consisting of S original vocabulary data, perform cleaning and word segmentation preprocessing on the original vocabulary data, and obtain the preprocessed text library data.

[0030] ② Define the hash model to be trained as follows:

[0031] ②-1 Perform word embedding processing on the preprocessed text database data to obtain a word embedding matrix;

[0032] ②-2 Construct a bidirectional LSTM model, input the word embedding matrix into the bidirectional LSTM model, and obtain the semantic code corresponding to each original vocabulary data;

[0033] ②-3 Use the text convolutional neural network to extract the n-gram features of each semantic code;

[0034] ②-4 Use the attention mechanism to extract the attention features of each semantic code;

[0035] ②-5 ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text hash retrieval method based on deep learning. The method is characterized by comprising the following steps of: extracting a semantic code corresponding to each piece oforiginal vocabulary data in a word embedding matrix by using a bidirectional LSTM model; then, connecting a text convolutional neural network in parallel behind the bidirectional LSTM model, adding an attention mechanism, converting the output value of the second full connection layer into a corresponding hash code by using a sign function; reconstructing a category label by utilizing hash codes;and finally, searching vector data closest to the Hamming distance of the retrieval text hash code in the text library hash code, and completing the hash retrieval process of the retrieval text data.The method has the advantages that the learning ability of the hash model to the short text is high, the added attention mechanism can further improve the expression ability of the features, the classification layer reconstructs the category labels by utilizing hash codes, so that the hash model can utilize the label information more finely while learning binary codes, and the retrieval precisionis high.

Description

technical field [0001] The invention relates to a text hash retrieval method, in particular to a text hash retrieval method based on deep learning. Background technique [0002] As the scale and dimension of data increase, the cost of semantic retrieval increases sharply. As an important way to achieve efficient semantic retrieval, text hashing has received extensive attention; however, most text hashing algorithms directly use machine learning The mechanism maps the explicit features or keyword features in this paper to binary codes. These features cannot effectively guarantee the semantic similarity between texts, resulting in low retrieval efficiency of the codes. Contents of the invention [0003] The technical problem to be solved by the present invention is to provide a deep learning-based text hash retrieval method with high retrieval accuracy and high efficiency. [0004] The technical solution adopted by the present invention to solve the above-mentioned technica...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/31G06F16/33G06F16/35G06N3/04G06N3/08
CPCG06F16/325G06F16/3331G06F16/35G06N3/08G06N3/044G06N3/045
Inventor 寿震宇钱江波辛宇谢锡炯陈海明
Owner 广西白鲸信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products