Word position tagging-based Tibetan word segmentation method
A word segmentation method and Tibetan technology, applied in the field of Tibetan word segmentation based on lexeme tagging, can solve the problem of poor segmentation and disambiguation processing effect, and achieve the effect of simplifying the design
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0031] Example 1: The word segmentation process of a typical Tibetan sentence
[0032] For input Tibetan text 302:
[0033] Step 304 according to the single vertical symbol of Tibetan Divide it into a Tibetan sentence;
[0034] Step 306 divides the Tibetan sentence into a series of Tibetan syllables (separated by slashes here), and the result after the segmentation is:
[0035] Step 308 affixes a lexeme label to each syllable, where the lexeme label is placed behind a slash to indicate that the result after labeling is:
[0036] Step 312 splits the syllable marked J and restores it into two syllables, the result after processing is (the part affected by this step is underlined, the same below):
[0037] Step 314 merges all syllables that are marked as B and the syllables that are marked as E behind them into one word, and the result after processing is:
[0038] In step 316, all syllables marked as S and all unmerged syllables are used as monosyllabic word...
Embodiment 2
[0040] Example 2: The word segmentation process of another typical Tibetan sentence
[0041] For input Tibetan text 302:
[0042] Step 304 according to the single vertical symbol of Tibetan Divide it into a Tibetan sentence;
[0043] Step 306 divides the Tibetan sentence into a series of Tibetan syllables (separated by slashes here), and the result after the segmentation is:
[0044] Step 308 affixes a lexeme label to each syllable, where the lexeme label is placed behind a slash to indicate that the result after labeling is:
[0045] Step 312 splits the syllable marked as J and restores it into two syllables, the result after processing is:
[0046] Step 314 merges all the syllables marked as B and the syllables marked as E thereafter and one or more syllables marked as M between them into one word, and the result after processing is:
[0047] In step 316, all syllables marked as S and all unmerged syllables are used as monosyllabic words, and the resul...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 