Analysis program, analysis method, and analyzer

A parsing method and parsing device technology, which are applied in the fields of instruments, natural language translation, semantic tool creation, etc., can solve problems such as insufficient morpheme parsing results, and achieve the effect of suppressing file size

Active Publication Date: 2020-01-17
FUJITSU LTD
View PDF15 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0014] In addition, in morphological analysis, it is considered to output proper nouns as unknown words, but since there are cases where they are split based on registered words or useful information is removed, it is used as a morphological analysis for Word2Vec the result becomes insufficient

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Analysis program, analysis method, and analyzer
  • Analysis program, analysis method, and analyzer
  • Analysis program, analysis method, and analyzer

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0034] figure 1 It is a figure for explaining an example of the process of the analysis apparatus of this Example. Such as figure 1 As shown, the analysis device executes the following processing when extracting words that are candidates for segmentation from the character string data 140a. For example, the character string data 140a is data of a document composed of CJK characters. The CJK characters correspond to characters of Chinese, Japanese, or Korean.

[0035] The analyzer compares the character string data 140a with the dictionary data 140b. The dictionary data 140b is data in which words (morphemes) to be split candidates are defined.

[0036] The analysis device scans the character string data 140a from the front end, extracts character strings matching words defined in the dictionary data 140b, and stores them in the array data 140c.

[0037] The array data 140c has the words defined in the dictionary data 140b among the character strings included in the charac...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

An analyzer (100) generates, on the basis of a dictionary used in morphological analysis, an index that relates to each of morphemes registered in the dictionary and in which a flag is set by which ahead and a tail are discriminable for each of the morphemes registered in the dictionary. The analyzer (100) extracts a plurality of dividable words from inputted character data using the index.

Description

technical field [0001] The present invention relates to analysis programs and the like. Background technique [0002] Conventionally, CJK (Chinese, Japanese, and Korean) characters are different from alphabetic characters separated by a separator such as a space, after recognizing the separation of morphemes, various processing is performed. For example, existing technologies that analyze morpheme separation from target character data and output character strings that can be split include morpheme dictionaries such as Mecab and Chasen, Trie trees (prefix trees), and Double Arrays (double arrays). [0003] As a technology that utilizes the analysis result of morpheme separation, there are technologies such as Word2Vec that vectorizes target character data. [0004] Patent Document 1: Japanese Unexamined Patent Publication No. 2010-146273 [0005] Patent Document 2: Japanese Patent Application Laid-Open No. 10-222511 [0006] Patent Document 3: Japanese Patent Laid-Open No....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/31G06F16/36G06F40/237
CPCG06F40/242G06F40/268G06F40/53G06F40/237G06F16/81G06F16/313
Inventor 片冈正弘出内将夫尾上聪
Owner FUJITSU LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products