Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and apparatus for generating word segmentation dictionary and method and apparatus for text to speech

A word segmentation dictionary, letter word technology, applied in speech synthesis, speech analysis, special data processing applications, etc., can solve the problem of large workload and time cost, and achieve the effect of improving word segmentation and speech synthesis.

Active Publication Date: 2015-11-25
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF2 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the machine learning method requires a large amount of labeled data, both in terms of workload cost and time cost.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and apparatus for generating word segmentation dictionary and method and apparatus for text to speech
  • Method and apparatus for generating word segmentation dictionary and method and apparatus for text to speech
  • Method and apparatus for generating word segmentation dictionary and method and apparatus for text to speech

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals denote the same or similar modules or modules having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention. On the contrary, the embodiments of the present invention include all changes, modifications and equivalents falling within the spirit and scope of the appended claims.

[0029] figure 1 It is a schematic flow chart of a method for generating a word segmentation dictionary proposed in an embodiment of the present invention, the method comprising:

[0030] S11: Divide the collected text within the preset range to obtain the sentences composing the text.

[0031] In the prior art, the word segmentation dictionary is obtained based on existing e...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method and apparatus for generating a word segmentation dictionary and a method and apparatus for text to speech. The method for generating the word segmentation dictionary includes: dividing texts within a present range that has been collected and obtaining sentences which constitute the texts; dividing the sentences and obtaining character strings of different lengths; based on the character strings of different lengths, determining credible vocabulary entries within the character strings of different lengths; based on the credible vocabulary entries, establishing the word segmentation dictionary. The method for generating the word segmentation dictionary can obtain the word segmentation dictionary which fits a corresponding field, which can further increase the effect of word segmentation and the effect of text to speech.

Description

technical field [0001] The invention relates to the technical field of speech processing, in particular to a method and device for generating a word segmentation dictionary and a method and device for speech synthesis. Background technique [0002] Speech synthesis, also known as text-to-speech (Text to Speech), can convert text information into speech and read it out in real time, which is equivalent to installing an artificial mouth on a machine. For speech synthesis systems, the input text needs to be processed first, including word segmentation. [0003] At present, there are two main types of word segmentation algorithms, one is based on dictionary matching algorithms, and the other is based on machine learning methods. In the algorithm based on dictionary matching, the word segmentation dictionary used is usually based on expert knowledge (such as electronic dictionary, Xinhua dictionary, etc.) to establish dictionary entries. However, the corpus of this kind of word...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/027G10L13/06G06F17/27G06F17/30
Inventor 李秀林肖朔白洁
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products