Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for segmenting Thai syllables

A syllable and Thai technology, applied in the field of information retrieval, can solve the problems of low accuracy, slow syllable segmentation in Thai, and complex grammar rules, and achieve the effect of improving accuracy and segmentation speed.

Inactive Publication Date: 2018-04-27
TRANSN IOL TECH CO LTD
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the complex and difficult to understand grammatical rules, there may be conflicts between a large number of rules, which makes the Thai syllable segmentation speed relatively slow, and the accuracy is not very high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for segmenting Thai syllables
  • Method and device for segmenting Thai syllables
  • Method and device for segmenting Thai syllables

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] The following description and drawings illustrate specific embodiments of the invention sufficiently to enable those skilled in the art to practice them. The examples merely represent possible variations. Individual components and functions are optional unless explicitly required, and the order of operations may vary. Portions and features of some embodiments may be included in or substituted for those of other embodiments. The scope of embodiments of the present invention includes the full scope of the claims, and all available equivalents of the claims. Herein, various embodiments may be referred to individually or collectively by the term "invention", which is for convenience only and is not intended to automatically limit the scope of this application if in fact more than one invention is disclosed. A single invention or inventive concept. Herein, relational terms such as first and second etc. are used only to distinguish one entity or operation from another with...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and device for segmenting Thai syllables, and belongs to the technical field of information retrieval. The method includes the steps of conducting pretreatment on Thaitext to be processed from a Thai corpus, and determining non-Thai character strings and the location syllable type information of each Thai character; labeling boundaries among all characters in theThai text to be processed, wherein each boundary consisting of at least one Thai syllable character is labeled as an identification to be segmented; extracting each syllable to be segmented in the Thai text to be processed, wherein each syllable to be segmented is composed of the Thai character which constantly appears n times and an identification to be segmented, and n is a positive integer; according to the location syllable type information of the Thai characters in the syllables to be segmented, adopting a Markov chain probability voice model, and determining the segmentation probabilityof each syllable to be segmented; according to each syllable to be segmented and the corresponding segmentation probability, segmenting and setting the syllables in a Thai sentence to be processed.

Description

technical field [0001] The invention relates to the technical field of information retrieval, in particular to a method and device for segmenting Thai syllables. Background technique [0002] Thai Also known as Dai language, it is the language of the Dai-Thai nationality and belongs to the East Asian / Sino-Tibetan language family. About 68 million people around the world speak Thai. In the Thai text, there is no punctuation between words and no spaces, and a sentence is spelled continuously from the beginning to the end. Generally, a sentence is represented by a space of two letters or a small pause in the sentence. However, as a well-defined basic unit in the grammar, Thai syllables and syllables in the text do not have obvious spaces between them. Therefore, the processing operation of the Thai text must first segment the Thai text into syllables. This segmentation processing work provides an important foundation for Thai lexical, syntactic, and more complex natural la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06K9/62
CPCG06F40/253G06F40/289G06F18/295
Inventor 张睦
Owner TRANSN IOL TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products