Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text sentence segmentation method and system

A technology for sentence segmentation and text, applied in the field of text segmentation methods and systems, can solve problems such as reducing the accuracy of sentence segmentation, less historical information, errors, etc., to achieve the effect of improving experience and ensuring accuracy

Active Publication Date: 2018-05-29
IFLYTEK CO LTD
View PDF5 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The existing sentence segmentation methods generally use the word vector information of the text data to segment the sentence directly through the method of sequence labeling. However, the word vector can only describe the text data, and cannot describe the relevant information of the text data corresponding to the speech data, so that the sentence segmentation The accuracy is low; in addition, the prior art generally uses a sequence tagging model to segment sentences, and the sequence tagging model can only remember less historical information, and cannot remember the future information of each word, further reducing the accuracy of sentence segmenting
For example, "how should I do something that will make her change her mind", the current word of the constructed sequence model is "thing", if the model cannot remember the historical information "how" of "thing", then the sentence sentence judgment at the "thing" , it is very likely that errors will occur; another example is "the word you said is a question word", if the model cannot remember the future information of the word "?", it will also make mistakes when judging the sentence at the word "?"

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text sentence segmentation method and system
  • Text sentence segmentation method and system
  • Text sentence segmentation method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0070] In order to enable those skilled in the art to better understand the solutions of the embodiments of the present invention, the embodiments of the present invention will be further described in detail below in conjunction with the drawings and implementations.

[0071] Since the word vector of text data can only describe the text data, it cannot describe the relevant information of the speech data corresponding to the text data. In fact, in the process of speaking, the speech data also contains strong sentence information, such as in speech The tone of the sentence where the data is interrupted is often falling, and the fundamental frequency value of the end of the sentence at the sentence will become smaller and smaller. At the same time, in terms of the energy of the voice data, the length of the pause between words will also change significantly. Considering this characteristic, the embodiment of the present invention provides a kind of text segmentation method and sy...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text sentence segmentation method and system. The method comprises following steps: pre-collecting a small amount of textual data and corresponding speech data, constructinga long-term memory segmentation model based on text segmentation features and acoustic segmentation features; when the text is segmented, obtaining the text of the sentence to be segmented and corresponding speech data; extracting text segmentation features and the acoustic segmentation features according to the to-be-segmented text and the speech data corresponding to the to-be-segmented text, respectively; according to the extracted text segmentation features, acoustic segmentation features, and the long-term memory segmentation model, segmenting the to-be-segmented text. The invention can effectively improve the accuracy of text segmentation.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a text sentence segmentation method and system. Background technique [0002] In recent years, with the practicality of speech recognition technology and the rapid development of hardware storage, more and more people are accustomed to using storage devices to record their voices, and using transcription tools to convert recorded voice data into text data for information preservation. Instead of the traditional way of manual recording while listening and taking notes to record important information. However, when performing speech recognition on speech data to obtain the corresponding recognized text, the text data is often continuous and uninterrupted, which is not conducive to the user's reading and understanding. The traffic pressure is very high, and parking is difficult. On the other hand, in this window, in this administrative service center, we can see that ther...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G10L15/06G10L15/26
CPCG10L15/063G10L15/26G06F40/211G06F40/289
Inventor 占吉清高建清王智国
Owner IFLYTEK CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products