Encoder model training method and storage medium, similarity prediction method and system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A training method and encoder technology, applied in the field of text similarity, can solve problems such as strong subjectivity, small detection coverage, and dependence on manual sampling inspection, and achieve the effects of precise calculation, improved accuracy, and increased inference bandwidth

Active Publication Date: 2022-07-12

CHINA UNICOM (GUANGDONG) IND INTERNET CO LTD

View PDF5 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] The present invention aims at overcoming at least one defect of the above-mentioned prior art, and provides an encoder model training method, a storage medium, a similarity prediction method and a system, which are used to solve the problem of relying on manual sampling when determining text similarity in the prior art, There are problems with small detection coverage and strong subjectivity

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0019] This embodiment provides a training method for a deep neural network encoder model, which is used for training a twin neural network encoder model. In a broad sense, a twin neural network can be composed of two sub-networks or one network. The key lies in the twinning Neural networks share the same neural network parameters.

[0020] combine figure 1 , 2 As shown, the method includes the following steps:

[0021] S110, inputting two different text sequences into the embedding layer for vectorization to obtain two text sequence vectors;

[0022] In this step, the text sequence refers to the text data that has been preprocessed to make it meet the input format compatible with the embedding layer. In a specific embodiment, the preprocessing includes:

[0023] Perform data cleaning on the original text data; read preset special symbols, stop words and user dictionary word lists, remove special symbols in the text data, and perform word segmentation on the text sequence ...

Embodiment 2

[0056] Based on the same concept as Embodiment 1, this embodiment provides a text sequence similarity prediction method, which mainly uses the neural network encoder model trained by the training method of the neural network encoder model provided by the embodiment to compare two different The similarity of text sequences is predicted.

[0057] combine image 3 , 4 As shown, the method includes:

[0058] S210, inputting two different text sequences into the embedding layer for vectorization to obtain two text sequence vectors;

[0059] Before the execution of this step, two kinds of text data that need to predict the similarity can be determined, and preprocessed such as serialization, so that they become two kinds of text sequences, which are the embedding layer, the neural network encoder model and the pooling layer. layer compatible.

[0060] S220, input the two text sequence vectors into the trained neural network encoder model, so that the neural network encoder model...

Embodiment 3

[0071] Based on the same concept as Embodiments 1 and 2, this embodiment provides a text sequence similarity prediction system, which mainly uses the neural network encoder model trained by the neural network encoder model training method provided in The similarity of two different text sequences is predicted.

[0072] like Figure 5 As shown, the system includes: a word input module 310, a word embedding module 320, a neural network encoder model trained by the training method provided in Embodiment 1, a hidden state pooling module 330, and a vector similarity calculation module 340.

[0073] The word input module 310 is configured to receive two kinds of text data input from the outside, serialize them to obtain two different text sequences, and output them to the word embedding module 320 .

[0074] The word embedding module 320 is used to vectorize the two text sequences, specifically, map the text sequences into a vector space, so as to obtain text sequence vectors of th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention provides a training method and a storage medium for an encoder model, a similarity prediction method and a system, including: inputting two text sequences into an embedding layer to obtain a text sequence vector; inputting the two text sequence vectors into a twin neural network encoder model to obtain a text sequence vector Make it based on the same neural network parameters to determine the hidden state; construct a self-supervised loss function according to the neural network parameters; input the hidden state into the pooling layer so that it is pooled according to the hidden state, and determined according to the text sequence vector after pooling. The similarity between the two text sequences and construct a supervised loss function; determine the loss function according to the self-supervised and supervised loss functions to update the neural network parameters; continue to input new text sequences until the value of the loss function is the minimum value . This method greatly improves the inference bandwidth of the model when calculating the similarity of text sequences, and based on the trained neural network encoder model, it can accurately calculate the similarity of two text sequences.

Description

technical field [0001] The present invention relates to the field of text similarity, and more particularly, to an encoder model training method and storage medium, and a similarity prediction method and system. Background technique [0002] Text similarity refers to measuring the similarity of two texts. The application scenarios include text classification, clustering, text topic detection, topic tracking, machine translation, etc. More specifically, the supervision of the call line in the voice communication scenario also requires the determination of the similarity between the texts, but the dialogue content obtained in the voice communication scenario is noisy, contains accents, and has insufficient information integrity. When judging whether the dialogue content is similar, it needs to rely on manual sampling, which consumes a lot of manpower and time. The problem of manual sampling is that the coverage of sampling is small, and the manual detection is highly subjectiv...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F16/33G06F40/289G06F40/30G06N3/04G06N3/08G06K9/62

CPCG06F16/3343G06F16/3344G06F40/289G06F40/30G06N3/08G06N3/047G06N3/045G06F18/22

Inventor 肖清赵文博李剑锋许程冲周丽萍

Owner CHINA UNICOM (GUANGDONG) IND INTERNET CO LTD

Encoder model training method and storage medium, similarity prediction method and system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology