Unlock instant, AI-driven research and patent intelligence for your innovation.

Multilingual text word segmentation method

A language text and word segmentation technology, which is applied in the field of word segmentation of multilingual texts, can solve problems such as improving the efficiency of natural language preprocessing, and achieve the effect of improving preprocessing efficiency

Inactive Publication Date: 2017-06-06
IOL WUHAN INFORMATION TECH CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] This method proposes a new word segmentation method to solve the problem of improving the efficiency of natural language preprocessing, that is, to perform word segmentation preprocessing on the input text without relying on the language dictionary and only scanning once. Operation, to obtain the smallest language-related units in the text

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multilingual text word segmentation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The technical solutions of the present invention will be further specifically described below in conjunction with the accompanying drawings and specific embodiments.

[0039] Such as figure 1 As shown, the present invention provides a word segmentation method for a multilingual text, comprising the following steps:

[0040] After preprocessing starts, first the user enters the preprocessed text. The characters in the input text to be processed will be read one by one in order, and the text is input in Unicode encoding format (step 101 ).

[0041] First, the type of the currently acquired character will be judged. According to the definition of the character type, this character will be judged as similar to Chinese characters (Chinese, Japanese, Korean and Thai), similar to Latin letters (Western European languages), numbers, punctuation marks, or blank characters. Then read in the next continuous character and judge its type as well. The processing of basic segmenta...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a multilingual text word segmentation method. Under the situation of being independent from a language dictionary and performing scanning only one time, the word segmentation preprocessing of the input text is conducted and the smallest language-related unit in the text is obtained. The preprocessing comprises basic segmentation processing, pairing symbol processing and user-defined common symbol processing, which is used for achieving the purpose of improving the efficiency of preprocessing the natural language.

Description

technical field [0001] The invention belongs to the field of natural language processing, and in particular relates to a word segmentation method for multilingual texts. Background technique [0002] For an input text, the computer only regards them as a string of ordinary character sequences, and the process of natural language preprocessing can analyze meaningful language components from this text string. Provides the basis for more complex natural language processing. [0003] Traditional natural language preprocessing technology mainly relies on dictionaries and multiple text scans and matching strings, that is, during the processing process, it is necessary to look up the corresponding strings in the dictionary and match the entries, and perform maximum forward matching and maximum reverse matching operations. For now, these preprocessing methods have achieved good preprocessing results. However, with the continuous expansion of global Internet coverage and the rapid...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06F17/22
CPCG06F40/12G06F40/289
Inventor 张睦
Owner IOL WUHAN INFORMATION TECH CO LTD