Fault-tolerant romanized input method for non-roman characters

a non-roman language, fault-tolerant technology, applied in the field of processing non-roman based languages, can solve the problems of complex spell correction and difficulty for non-roman languages such as cjk languages, users of chinese language may not know the correct pronunciation (pinyins), and enter incorrect pinyin inputs

Inactive Publication Date: 2006-03-02
GOOGLE LLC
View PDF7 Cites 88 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0019] These and other features and advantages of the present invention will be presented in more detail in the follo

Problems solved by technology

However, Chinese language users may not know the correct pronunciations (pinyins) of some Chinese characters due to, for example, their dialect and/or accent, and therefore may enter incorrect pinyin inputs.
However, the user's intended character set may not be included in the candidate list as most pinyin input methods have a low or no fault tolerance.
Spell correction for non-Roman languages such as CJK languages is also complex and challenging in that there are no standard dictionaries in such languages because the definition of CJK words are not clean.
In contrast, the English diction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Fault-tolerant romanized input method for non-roman characters
  • Fault-tolerant romanized input method for non-roman characters
  • Fault-tolerant romanized input method for non-roman characters

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] Fault-tolerant systems and methods to process and correct input spelling errors for non-Roman based languages such as Chinese, Japanese, and Korean (CJK) are disclosed. The fault-tolerant input systems and methods described herein generally relate to processing, detecting, and correcting spelling errors by employing probabilities that may be derived from user input entries and associated user selections such as query logs. It is noted that for purposes of clarity only, the examples presented herein are generally presented in terms of processing, detecting and correcting Chinese pinyin inputs. However, the systems and methods for spelling error detection and correction may be similarly applicable for other non-Roman based languages such as Japanese, Korean, Thai, etc. The following description is presented to enable any person skilled in the art to make and use the invention. Descriptions of specific embodiments and applications are provided only as examples and various modifi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Fault-tolerant systems and methods to process and correct input spelling errors for non-Roman based languages such as Chinese, Japanese, and Korean (CJK) are disclosed. The method may be applied to a Chinese input method using pinyin. For example, the method may generally include receiving a pinyin input representing characters in Chinese, the input having at least one original pinyin, identifying potentially incorrect pinyins in the input, expanding each potentially incorrect pinyin to at least one additional alternative pinyin, each pair of potentially incorrect and corresponding alternative pinyin having a proximity measurement, converting each pinyin in the input and each alternative pinyin to Chinese characters, computing likelihoods of possible conversions of the pinyin input to Chinese characters, each possible Chinese conversion being a combination of the converted original and/or alternative pinyins of the input, the probabilities being based on the proximity measurement and optionally on a context of the possible Chinese conversion, and determining a most likely Chinese conversion from the possible conversions.

Description

BACKGROUND OF THE INVENTION [0001] 1. Field of the Invention [0002] The present invention relates generally to processing non-Roman based languages. More specifically, fault-tolerant systems and methods to process and correct input spelling errors for non-Roman based languages such as Chinese, Japanese, and Korean (CJK) are disclosed. [0003] 2. Description of Related Art [0004] Spell correction generally includes detecting erroneous words and determining appropriate replacements for the erroneous words. Most spelling errors in alphabetical, i.e., Roman-based, languages such as English are either out of vocabulary words, e.g., “thna” rather than “than,” or valid words improperly used in its context, e.g., “stranger then” rather than “stranger than.” Spell checkers that detect and correct out of vocabulary spelling errors in Roman-based languages are well known. [0005] Users of non-Roman based languages such as Chinese, Japanese, and Korean (CJK) often utilize Roman-based (alphabetica...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/24
CPCG06F17/273G06F40/232
Inventor WU, JUNCHEN, LIREN
Owner GOOGLE LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products