Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Sentence boundary identification method in spoken language dialogue

A technology of boundary recognition and spoken language, applied in special data processing applications, instruments, electrical digital data processing, etc.

Inactive Publication Date: 2005-01-26
INST OF AUTOMATION CHINESE ACAD OF SCI
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a sentence boundary recognition method in a spoken conversation, which solves the problem of converting the continuous text after speech recognition into a sentence that can be processed by the subsequent analysis module

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sentence boundary identification method in spoken language dialogue
  • Sentence boundary identification method in spoken language dialogue
  • Sentence boundary identification method in spoken language dialogue

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0016] Various details involved in the technical solution of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0017] Preprocessing of spoken corpus

[0018] The acquired oral corpus cannot be directly used for training, but must undergo some preprocessing. Sentence boundary segmentation is to find the end point of the sentence in the continuous text, that is, to predict the occurrence position of those sentence-end punctuation, so as long as it is the end-sentence punctuation, there is no difference for segmentation. The main task of preprocessing is to replace the end-of-sentence punctuation in the corpus with a unified symbol. For the convenience of description, the replacement symbol in this article is represented by "SB"; for other punctuation other than the end-of-sentence punctuation, it must be deleted, because the phonetic It is impossible for the recognized text to contain such punctuation marks. For Chinese, t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A kind of sentence dividing method based on bi-directional N-gram model and Maximum Entrpy model. This method comprises training and dividing. The training process comprises: obtaining the spoken language material library, preprocessing the spoken language material library, for example replacement; counting the n term co-appear frequency of n-gram model; estimating the n term positive dependence probability and n term negative dependence probability; obtaining the n term positive and negative dependence probability database; setting the eigenfunction of Maximum Entropy model; circulating account the eigenfunction parameter, obtaining eigenfunction parameter database. This method is a kind of purely statistical method. The operation only needs a background spoken language material library that doesn't need further dividing or labeling.

Description

technical field [0001] The invention relates to speech recognition, in particular to a boundary recognition method for spoken sentences. Background technique [0002] With the rapid development of computer hardware conditions and the continuous improvement of speech recognition technology, language understanding and generation systems using speech as the interface (hereinafter referred to as speech-language combined systems) such as man-machine interface, man-machine dialogue system, simultaneous translation system, etc. Started to be practical. These systems have broad application prospects. For example, the man-machine voice interface, its perfection will make people no longer worry about learning cumbersome computer operations, because you only need to "speak" to the computer to listen to anything, and it will execute according to your requirements. Another example is simultaneous translation technology, which will eliminate communication barriers between users of diffe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/00G06F17/30
Inventor 宗成庆刘丁
Owner INST OF AUTOMATION CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products