Punctuation mark labeling model and training method and device thereof, and storage medium

A technology of punctuation marks and labeling models, applied in neural learning methods, biological neural network models, instruments, etc., can solve problems such as poor versatility, weak model generalization ability, and low recall rate of neural network models

Pending Publication Date: 2020-01-10
SHANGHAI XIAOI ROBOT TECH CO LTD
View PDF3 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the learning and training of the neural network model requires manual labeling of a large amount of training data, and the recall rate of the neural netw

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Punctuation mark labeling model and training method and device thereof, and storage medium
  • Punctuation mark labeling model and training method and device thereof, and storage medium
  • Punctuation mark labeling model and training method and device thereof, and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] As mentioned above, although the accuracy of the punctuation mark addition method has been greatly improved under the use of neural networks, however, the punctuation mark addition method needs to prepare a large amount of training data in advance for neural network model training, and these training data are often obtained through The non-punctuation corpus generated by Automatic Speech Recognition (ASR) requires time-consuming and laborious manual labeling before training, and then uses the trained neural network model to punctuate the non-punctuation text obtained from speech recognition. Forecasting, the obtained punctuation prediction results tend to have a low recall rate. In addition, the training data of the current neural network model often only considers the information of the word before the punctuation mark, resulting in a very uneven label distribution of the training data, and the neural network model trained in this way has poor generalization ability and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a punctuation mark labeling model and a training method and device thereof, and a storage medium. The method comprises the steps of obtaining a first training corpus containingpunctuation marks; inputting the first training corpus into a preset neural network model with a time sequence for pre-training to obtain a pre-trained language sub-model; obtaining a second trainingcorpus containing punctuation marks, removing the punctuation marks from the second training corpus, and labeling corresponding label combinations at front and rear word segmentation units of the removed punctuation marks to obtain a third training corpus; inputting the third training corpus into an initial punctuation mark labeling model for transfer learning training, obtaining a trained punctuation mark labeling model, the punctuation mark labeling model comprising a pre-trained language sub-model, and the third training corpus comprising a punctuation-free text set and a label sequence set. According to the scheme, a large amount of training data does not need to be manually labeled, the recall rate is increased, and the obtained punctuation mark labeling model has good generalizationability and universality.

Description

technical field [0001] Embodiments of the present invention relate to the technical field of computer natural language processing, and in particular to a punctuation mark labeling model and its training method, equipment, and storage medium. Background technique [0002] Existing punctuation recovery schemes usually use sequence labeling, which is mainly applied to recovering punctuation marks in text obtained from speech recognition. Generally, only simple punctuation marks can be added, such as commas and periods. The punctuation added in this way, on the one hand, has low accuracy and poor generalization ability; on the other hand, the marked punctuation is poor in richness, resulting in a poor reading experience. [0003] With the continuous development of deep learning technology, the trained and learned neural network model can be used to predict the punctuation marks of the text obtained from speech recognition and improve the accuracy. However, the learning and trai...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/20G06F40/284G06F40/289G06F40/166G06N3/08
CPCG06N3/08
Inventor 沈大框陈培华陈成才
Owner SHANGHAI XIAOI ROBOT TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products