Tone-character conversion method

A phonetic word conversion and syllable technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as ambiguity in complex combinations, achieve the effects of ensuring efficiency, reducing storage space limitations, and solving ambiguity problems

Inactive Publication Date: 2009-07-29
INST OF SOFTWARE - CHINESE ACAD OF SCI
View PDF0 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0018] 4. The multiple ambiguities mentioned above are adjacent to each other, resulting in more complex combination ambig

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Tone-character conversion method
  • Tone-character conversion method
  • Tone-character conversion method

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0054] Example 1

[0055] This embodiment illustrates the process of converting the pinyin string "yigemingantian" into a text string according to the prior art method, such as image 3 shown. In the present embodiment, the pinyin string segmentation method is the forward maximum matching method, the word retrieval method is realized through a dictionary tree, the language model adopts a ternary N-gram language model (that is, N gets 3), and WORD_MAX_LEN is 3 (that is, at most Consider three syllables), BEAM_WIDTH is 4 (that is, the candidate word set of syllables includes 4 elements, corresponding to image 3 four rectangular boxes in each column). The specific process is described as follows:

[0056] 1. Input the pinyin string "yigemingantian" and divide it into five-syllable syllable combinations of "yi"+"ge"+"ming"+"an"+"tian". For a single syllable, it is represented by Y below, for example, Y 1 = "yi";

[0057] 2. Let i=1,

[0058] a) When j=3, since there is no ...

Example Embodiment

[0078] Example 2

[0079] The present embodiment illustrates the process of converting the pinyin string "yigemingantian" into a literal string according to the method, as Figure 4shown. Its concrete steps are identical with embodiment 1, difference is:

[0080] For each syllable, when j=2, check Y i-1 +Y i Whether it belongs to type I ambiguity, when j=1, check Y i Is it a Type II ambiguity.

[0081] Specifically for this embodiment, in step 5.b, Y is detected 3 +Y 4 , that is, "mingan" belongs to type I ambiguity, so the obtained word "sensitive" is incorporated into W 42 , then W 42 ={homicide, light and dark, sensitive}, assuming that the final score ranking in step 5.c is "start-one-sensitive" > "start-one-homicide" > "start-one-light and dark" > "one-name- According to "> "one-name-case">..., then W' in this embodiment 4 ={sensitive, homicide, light and dark, press};

[0082] Also, check out Y in step 6.c 5 , that is, "tian" belongs to type II ambiguity, so ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a sound-character conversion method, belonging to the field of Chinese information processing technology. The method comprises the following steps of: syncopating an input pinyin string, converting syllables into Chinese characters by utilizing a word tree layer, and simultaneously calling an ambiguity processing module to determine the ambiguity for the syncopated syllables; if the ambiguity exits, syncopating is carried out again, and the ambiguity comprises an I type ambiguity and/or an II type ambiguity; and by utilizing a trained language module, the method sorts the probability scores of alternative words of the syllables and obtains word hybrid grids and text strings in a universal decoding layer according to the scores. Compared with the prior art, the method covers all possible syncopations, solves the problem of most syncopation ambiguities, ensures definite efficiency, can not generate combined explosion based on code element syncopation, saves memory storage space, reduces the restrictions on the storage space, and can be used for Chinese information processing, comprising Chinese character input, voice input and the like.

Description

technical field [0001] The present invention relates to Chinese character input, in particular, the present invention relates to a method for converting pinyin strings into Chinese character strings, which is mainly used in the post-processing stage of internal decoding and speech recognition of the Chinese pinyin input method. It belongs to the technical field of Chinese information processing. Background technique [0002] Phonetic character conversion refers to the conversion of Chinese syllable strings (pinyin strings) into Chinese character strings automatically by a computer. Phonetic-to-character conversion technology is an important foundation in Chinese information processing and has a wide range of applications. For example, the Chinese Pinyin keyboard input method and speech recognition system will be used. Chinese character input is the basis of Chinese information processing, and phonetic-to-character conversion is the core algorithm of Chinese input. The accu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/28
Inventor 张顺昌孙乐李文波
Owner INST OF SOFTWARE - CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products