A punctuation mark prediction method based on a self-attention mechanism

A technology of punctuation and prediction methods, applied in neural learning methods, natural language data processing, special data processing applications, etc., can solve problems such as long-term dependence and overfitting calculation amount, gradient disappearance, etc., to reduce training difficulty and alleviate Gradient disappears and enhances the effect of feature transfer

Active Publication Date: 2019-04-02
SUN YAT SEN UNIV
View PDF6 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present invention provides a method for predicting punctuation marks based on a self-attention mechanism in order to overcome the technical problems of gradient disappearance, long-term dependence, overfitting, and excessive calculation in the prior art when predicting text punctuation marks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A punctuation mark prediction method based on a self-attention mechanism
  • A punctuation mark prediction method based on a self-attention mechanism
  • A punctuation mark prediction method based on a self-attention mechanism

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] Such as figure 1 As shown, a punctuation prediction method based on self-attention mechanism, including the following steps:

[0044] S1: Perform speech recognition based on automatic speech recognition technology to obtain text without punctuation marks;

[0045] S2: Sorting the text without punctuation marks to obtain a text sequence;

[0046] S3: Build a punctuation prediction model, import the text sequence into the model, and complete the punctuation prediction of the text sequence.

[0047] In the specific implementation process, the text sequence input by the model is X 1 ,X 2 ,...,X T , representing T words, the output sequence is Y 1 ,Y 2 ,...,Y t , mark each word; among them, Y t =0 represents no punctuation after the word, Y t = 1 means that the word is followed by a comma, Y t =2 represents that the word is followed by a full stop, Y t =3 means that the word is followed by a question mark.

[0048] More specifically, in step S3, the punctuation p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a punctuation mark prediction method based on a self-attention mechanism, which comprises the following steps of performing voice recognition based on an automatic voice recognition technology to obtain a punctuation mark-free text; processing the punctuation-free text to obtain a text sequence; and constructing a punctuation mark prediction model, and importing the text sequence into the model to complete punctuation mark prediction of the text sequence. According to the punctuation prediction method based on the self-attention mechanism provided by the invention, the punctuation prediction of the speech recognition text is realized by constructing the punctuation prediction model, the problem of gradient disappearance is effectively alleviated, the feature transferis enhanced, and the long-term dependency relationship of the text is effectively established. Meanwhile, compared with a previous model, the additional parameters are not needed, the transmitted data size is effectively reduced, and the training difficulty of the parameters is reduced.

Description

technical field [0001] The present invention relates to the field of natural language processing, and more specifically, to a method for predicting punctuation marks based on a self-attention mechanism. Background technique [0002] With the development of deep learning, in recent years, many scholars have proposed to use neural networks to predict punctuation marks. The general neural network model consists of two steps: the first step is to use recurrent neural networks, convolutional neural networks, and attention mechanisms to generate Text expression with context information; the second step is based on the expression rich in context information generated in the first step, when predicting punctuation marks for each word, use the normalized exponential function or conditional random field to calculate comma, period, question mark, and no punctuation marks, and the highest-scoring class is selected from these four as the token for the word, resulting in a reasonable sequ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/24G06N3/04G06N3/08G10L15/26
CPCG06N3/08G10L15/26G06F40/166G06N3/045Y02D10/00
Inventor 邓豪权小军
Owner SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products