The invention discloses a multilingual word segmentation method based on dictionaries and grammar analysis. Efficient and accurate word segmentation of mixed texts of Chinese, Japanese, Korean, Cantonese and the like can be realized, flexible
lexicon expansion of words for different time periods and different professionals can be realized,
lexicon information is updated effectively, and efficient and accurate multilingual language text word segmentation is realized; a word segmentation sub-device of Chinese, Japanese, Korean, Cantonese and other language families, a Chinese
quantum word segmentation device and a western language word segmentation device are embedded to realize the accurate word segmentation of each language text; a text segment to be performed with word segmentation is segmented by a built-in language segment coded identification mechanism, each segmented text segment corresponds to a language family, and the word segmentation is carried out by using a corresponding word segmentation sub-device; the word segmentation of western inflectional languages and the smart mode word segmentation of the Chinese, Japanese, Korean, Cantonese can be realized by grammar analysis, and texts containing Arabic numeral information can be processed; and meanwhile, the word segmentation of texts with a plurality of mixed languages can also be realized by the multilingual word segmentation method provided by the invention, thereby getting rid of the limitation that a word segmentation tool can only realize the word segmentation of single language and some individual languages and ensuring the security, accuracy, efficiency, flexibility and universality of word segmentation of texts. The multilingual word segmentation method provided by the invention has a wide application prospect in the text word segmentation fields such as enhancement of
mass data text classification, text
information extraction, autoabstract, etc.