Encoder model training method and storage medium, similarity prediction method and system
A training method and encoder technology, applied in the field of text similarity, can solve problems such as strong subjectivity, small detection coverage, and dependence on manual sampling inspection, and achieve the effects of precise calculation, improved accuracy, and increased inference bandwidth
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0019] This embodiment provides a training method for a deep neural network encoder model, which is used for training a twin neural network encoder model. In a broad sense, a twin neural network can be composed of two sub-networks or one network. The key lies in the twinning Neural networks share the same neural network parameters.
[0020] combine figure 1 , 2 As shown, the method includes the following steps:
[0021] S110, inputting two different text sequences into the embedding layer for vectorization to obtain two text sequence vectors;
[0022] In this step, the text sequence refers to the text data that has been preprocessed to make it meet the input format compatible with the embedding layer. In a specific embodiment, the preprocessing includes:
[0023] Perform data cleaning on the original text data; read preset special symbols, stop words and user dictionary word lists, remove special symbols in the text data, and perform word segmentation on the text sequence ...
Embodiment 2
[0056] Based on the same concept as Embodiment 1, this embodiment provides a text sequence similarity prediction method, which mainly uses the neural network encoder model trained by the training method of the neural network encoder model provided by the embodiment to compare two different The similarity of text sequences is predicted.
[0057] combine image 3 , 4 As shown, the method includes:
[0058] S210, inputting two different text sequences into the embedding layer for vectorization to obtain two text sequence vectors;
[0059] Before the execution of this step, two kinds of text data that need to predict the similarity can be determined, and preprocessed such as serialization, so that they become two kinds of text sequences, which are the embedding layer, the neural network encoder model and the pooling layer. layer compatible.
[0060] S220, input the two text sequence vectors into the trained neural network encoder model, so that the neural network encoder model...
Embodiment 3
[0071] Based on the same concept as Embodiments 1 and 2, this embodiment provides a text sequence similarity prediction system, which mainly uses the neural network encoder model trained by the neural network encoder model training method provided in The similarity of two different text sequences is predicted.
[0072] like Figure 5 As shown, the system includes: a word input module 310, a word embedding module 320, a neural network encoder model trained by the training method provided in Embodiment 1, a hidden state pooling module 330, and a vector similarity calculation module 340.
[0073] The word input module 310 is configured to receive two kinds of text data input from the outside, serialize them to obtain two different text sequences, and output them to the word embedding module 320 .
[0074] The word embedding module 320 is used to vectorize the two text sequences, specifically, map the text sequences into a vector space, so as to obtain text sequence vectors of th...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


