Unlock instant, AI-driven research and patent intelligence for your innovation.

Text division apparatus and text division method

A text segmentation and location segmentation technology, applied in special data processing applications, instruments, electrical digital data processing, etc.

Inactive Publication Date: 2016-12-21
FUJITSU LTD
View PDF9 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] In the above-mentioned conventional word segmentation device or morphological analysis system, the segmentation position is determined based on information on only a part of the text, so it is not always possible to segment the text at an appropriate position.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text division apparatus and text division method
  • Text division apparatus and text division method
  • Text division apparatus and text division method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] Hereinafter, embodiments will be described in detail with reference to the drawings.

[0025] For example, when the word segmentation device of Patent Document 1 is used to segment the text "そうはいってもっと向んでください", the text is segmented by the longest matching search of the dictionary for word segmentation. Therefore, although the correct segmentation result is "そう / はいって / もっと / 向んで / ください", sometimes an undesirable segmentation result such as "そう / は / いっても / っと必んでください" is obtained.

[0026] This is considered to be due to the fact that although the division position may vary depending on the word immediately following a certain word, the division position is determined simply by the longest coincidence search without detecting a context wider than the word.

[0027] In addition, when using the morpheme analysis system of Patent Document 2 to segment the text of the compound word "Natural Language Processing Technology", the longest agreement search is performed again from the backward ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a text division apparatus and a text division method to efficiently divide text proper positions. A computer retrieves (step 201) a first character string included by the text from registered character strings divided into a plurality of words and character string division information established corresponding to distinguished word count. Further, in a situation that the first character string is corresponding to the registering character strings, the computer divide (step 202) a second character string, which is in the first character string and comprises words establishing a distinguished word count corresponding to the registered character strings, into the word of the distinguished word count.

Description

technical field [0001] The invention relates to a text segmentation device and a text segmentation method. Background technique [0002] In recent years, information on the Internet has increased dramatically, and businesses using big data have increased, so it is desired to efficiently process big data. In the case of a document that does not separate words and word representations with separator characters such as blanks, such as Japanese, Chinese, or Korean documents, morphological analysis is performed to calculate the frequency of occurrence of words. [0003] Morpheme analysis is a process of dividing a text into morphemes and assigning part-of-speech information to each morpheme. The morphemes obtained by morpheme analysis are also sometimes handled as words. By performing such morphological analysis, it is possible to determine the relationship between words in the document and the part of speech of the word, and to divide the text in the document into words. Howe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/247G06F40/284
Inventor 大仓清司片冈正弘出内将夫
Owner FUJITSU LTD