Method and device for adding text label

A text and sequence tagging technology, applied in the field of computer science, can solve problems such as inability to apply, cannot fully solve the problem of adding punctuation marks, and achieve the effect of solving the problem of adding

Inactive Publication Date: 2017-10-13
BEIJING SINOVOICE TECH CO LTD
View PDF6 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method cannot be applied to application scenarios such as machin

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for adding text label
  • Method and device for adding text label
  • Method and device for adding text label

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] refer to figure 1 , which shows a flow chart of Embodiment 1 of a method for adding text annotations in this application, which may specifically include the following steps:

[0045] Step 101. Obtain unlabeled text.

[0046] In a specific implementation, the non-marked text may be text without punctuation such as speech-recognized text or machine-translated text. Speech recognition technology is a technology that allows machines to convert voice signals into corresponding text or commands through the process of recognition and understanding. The text obtained by speech recognition technology has no punctuation marks. Reasonable addition of punctuation marks is necessary to improve the user's reading experience and help users quickly understand the text content.

[0047] Step 102: Process the unlabeled text by using the sequence labeling model trained in advance using the neural network model to obtain the sequence labeling of the non-labeled text.

[0048] In a spec...

Embodiment 2

[0058] This embodiment includes steps 101, 102, and 103 in Embodiment 1, and the specific implementation manner is the same as that of Embodiment 1, and will not be repeated here. For the training process of the sequence labeling model adopted in step 102 in this embodiment, refer to figure 2 , which may specifically include the following sub-steps:

[0059] Sub-step 201, acquire text samples with correct annotations.

[0060] In a specific implementation, texts with correct annotations can be obtained from the Internet or books.

[0061] Sub-step 202, perform serialization processing on the text samples with correct annotations to obtain unlabeled text samples and serial annotation samples.

[0062] This step is to remove annotations from the text with correct annotations to obtain non-annotated texts; and then convert the non-annotated texts into sequence annotations according to the position and type of text annotations in the texts with correct annotations. For details...

Embodiment 3

[0097] refer to Figure 4 , which shows a structural block diagram of Embodiment 3 of an apparatus for adding text annotations in this application, which may specifically include the following modules:

[0098] An unlabeled text acquiring module 401, configured to acquire unlabeled text.

[0099] The sequence annotation generation module 402 is configured to process the unlabeled text by using the sequence annotation model trained in advance using the neural network model to obtain the sequence annotation of the unlabeled text.

[0100] In a preferred embodiment of the present application, the neural network model may be an LSTM neural network model or a GRU neural network model.

[0101] In a preferred embodiment of the present application, when the neural network model is an LSTM neural network model, the LSTM neural network model may be a multi-layer LSTM neural network model, or a bidirectional LSTM neural network model.

[0102] A text annotation adding module 403, conf...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The application provides a method and device for adding a text label. The method comprises the steps that an unlabelled text is acquired; a sequence label model obtained through training of a neural network model is pre-used to process the unlabeled text, so a sequence label of the unlabelled text can be obtained; and according to the sequence label, the text label will be added to the unlabeled text. According to the invention, by the sequence label model obtained through training of the neural network model, a text labeling problem is converted into a sequence label problem; and the text label can then be added according to the sequence label. Hence, the method and device provided by the invention do not need to depend on any assistance information and can add the text label for the unlabelled text; and the problem of text label addition can be solved comprehensively.

Description

technical field [0001] The invention relates to the field of computer science, in particular to a method for adding text annotations and a device for adding text annotations. Background technique [0002] In today's society, the commercial demand for speech recognition technology in the field of artificial intelligence is increasing day by day, but the text results obtained by speech recognition will not get punctuation marks at the same time. Therefore, in order to improve the user's reading experience, it is necessary to add punctuation marks to the text when post-processing the text. In addition to speech recognition, it is also possible to obtain text without punctuation in some other scenarios, such as machine translation. Therefore, the technology of adding punctuation marks has research and application value. [0003] Existing technologies for adding punctuation often rely on some information about the audio itself, such as pauses and intervals in dialogue in the au...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
CPCG06F40/205G06F40/279
Inventor 李健殷子墨张连毅武卫东
Owner BEIJING SINOVOICE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products