Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Word Segmentation Method and Device

A word segmentation method and word segmentation technology, applied in the Internet field, can solve the problems of difficulty in using the text information to be processed, insufficient coverage, limited dictionary size, etc., and achieve the effect of improving the effect of speech synthesis, improving the timeliness, and improving the effect of word segmentation.

Active Publication Date: 2018-05-04
BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, the word segmentation method based on dictionary matching, due to the limited size of the dictionary, and the limitation and influence of word frequency information by the size of the statistical corpus used, may not cover comprehensively, or the corpus is not balanced enough, and there are still many inaccuracies when processing text. Especially when facing some words that are not common in massive corpus statistics, such as personal names, place names, exclusive names, etc., often lead to a lot of effort, but the results are not ideal
[0005] The word segmentation method based on machine learning requires a very large amount of labeled data, and the quantity and accuracy of labeled data will also have a great impact on the model.
Moreover, since word segmentation dictionaries are not required, some very unacceptable word segmentation errors often occur, resulting in unstable word segmentation results and poor user experience
[0006] To sum up, the existing word segmentation methods do not consider the overall situation of the current text in the word segmentation process, but only process sentence by sentence, so it is difficult to use more text information to be processed, making the word segmentation results inaccurate, or causing the same A word (such as a person's name, a place name, or a proper name, etc.) has very different segmentations in different sentences
Embodied in the speech synthesis system, it often makes the listener have difficulty in understanding, the comfort level is seriously reduced, and the user experience is also poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word Segmentation Method and Device
  • Word Segmentation Method and Device
  • Word Segmentation Method and Device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention. On the contrary, the embodiments of the present invention include all changes, modifications and equivalents coming within the spirit and scope of the appended claims.

[0024] figure 1 It is a schematic diagram of the processing flow of the speech synthesis system in the prior art. After the input text is processed through text processing, prosody prediction, acoustic parameter generation and waveform generation, the synthesized speech is output. The process of text processing can be subdivided into text preprocessing, word segmentation,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a word segmentation method and device. The method may include: sending a text to be synthesized to a search engine, and preprocessing the text to be synthesized; acquiring search results of the search engine after searching according to the text to be synthesized, and acquiring a dictionary or model corresponding to the search results; subjecting the preprocessed text to a word segmentation based on the dictionary or model corresponding to the search results. The method uses the text to be synthesized to search and acquire the more matching word segmentation dictionary or model, thus the word segmentation effect is improved, and the quality of synthesized voice is improved.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a word segmentation method and device. Background technique [0002] Speech synthesis, also known as text-to-speech (Text to Speech) technology, can convert any text information into a standard and smooth voice in real time, which is equivalent to installing an artificial mouth on the machine. For a speech synthesis system, it is first necessary to process the input text, including text preprocessing, word segmentation, part-of-speech tagging, phonetic notation and prosodic level prediction, etc., and then predict the acoustic features corresponding to each unit through the acoustic model, and finally use the acoustic parameters to directly Synthesize sounds through a vocoder, or pick units from a recorded corpus for concatenation. [0003] In speech synthesis system, word segmentation is the basis of the whole system. The performance of word segmentation directly affects ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 李秀林
Owner BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products