Spoken language text processing method for removing stop words and predicting sentence boundaries

A text processing and boundary technology, applied in natural language data processing, neural learning methods, electrical digital data processing, etc., can solve problems such as unsatisfactory prediction accuracy, difficulty in extracting deep features of text sequences, and difficulty in commercialization. Achieve the effect of enhancing collaborative forecasting capabilities, improving forecasting accuracy, and overcoming errors that are easy to introduce

Active Publication Date: 2020-06-26
网经科技(苏州)有限公司
View PDF2 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the existing methods are difficult to extract the deep features of the text sequence, and the prediction accuracy is not ideal enough, and it is difficult to meet the requirements of commercialization.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Spoken language text processing method for removing stop words and predicting sentence boundaries
  • Spoken language text processing method for removing stop words and predicting sentence boundaries
  • Spoken language text processing method for removing stop words and predicting sentence boundaries

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] In order to have a clearer understanding of the technical features, purposes and effects of the present invention, specific implementations are now described in detail.

[0049] Such as figure 1 As shown, the spoken text processing method for removing stop words and predicting sentence boundaries includes the following steps:

[0050] S101) collecting spoken language recognition text corpus;

[0051] The non-punctuated text sequence obtained from spoken language recognition is a prerequisite and a data form that needs to be processed in batches; when data is lacking, quasi-spoken texts with punctuation in the same field, such as online question-and-answer records, etc., can be used as the initial corpus;

[0052] S102) mark the stop words in the text corpus;

[0053] For the sentence-by-sentence analysis and review of the corpus obtained in step S101, and mark the meaningless fragments therein; if the corpus contains text with punctuation, then ignore the punctuation ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a spoken language text processing method for removing stop words and predicting sentence boundaries. The spoken language text processing method comprises the following steps: firstly, collecting spoken language recognition text corpora; then marking stop words in the text corpus; marking words on the two sides of sentence boundaries in the text corpus; training a sequence labeling model by adopting a machine learning method; and finally, processing the oral text by adopting the model. A sequence labeling mode is adopted to identify and remove stop words in a text sequence, a machine learning scheme combining text vector embedding, forward and reverse bidirectional coding and a conditional random field is adopted, deep semantic features of spoken language texts are efficiently extracted, and the tag sequence prediction accuracy is improved; one model is adopted to simultaneously complete stop language removal and sentence boundary prediction; after processing, the voice recognition text is more prominent in key point, reasonable punctuation separation is achieved, human reading is facilitated, and the natural language understanding module can select the optimal processing granularity conveniently.

Description

technical field [0001] The invention relates to a processing method for removing stop words and predicting sentence boundaries from a text sequence without punctuation after speech recognition, belonging to the technical field of natural language processing. Background technique [0002] In recent years, with the breakthrough of artificial intelligence technology in the field of speech signal processing, speech recognition has achieved rapid development. At present, there are many commercial application scenarios, such as speech input method, speech assistant, smart speaker, translation machine, etc. No matter what kind of application form, speech recognition into text sequence is the first step to bear the brunt. But unfortunately, a typical speech recognition system is only responsible for converting sound clips into text sequences with the highest probability. There is a one-to-one correspondence between syllables with information and texts, and the features such as long ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/211G06F40/253G06F40/30G06N7/00G06N3/04G06N3/08
CPCG06N3/084G06N7/01G06N3/044G06N3/045Y02D10/00
Inventor 孟亚磊刘继明金宁王力成陈浮
Owner 网经科技(苏州)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products