Multilingual text word segmentation method
A language text and word segmentation technology, which is applied in the field of word segmentation of multilingual texts, can solve problems such as improving the efficiency of natural language preprocessing, and achieve the effect of improving preprocessing efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0038] The technical solutions of the present invention will be further specifically described below in conjunction with the accompanying drawings and specific embodiments.
[0039] Such as figure 1 As shown, the present invention provides a word segmentation method for a multilingual text, comprising the following steps:
[0040] After preprocessing starts, first the user enters the preprocessed text. The characters in the input text to be processed will be read one by one in order, and the text is input in Unicode encoding format (step 101 ).
[0041] First, the type of the currently acquired character will be judged. According to the definition of the character type, this character will be judged as similar to Chinese characters (Chinese, Japanese, Korean and Thai), similar to Latin letters (Western European languages), numbers, punctuation marks, or blank characters. Then read in the next continuous character and judge its type as well. The processing of basic segmenta...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 
