Unlock instant, AI-driven research and patent intelligence for your innovation.

Natural language component identifying correcting apparatus and method based on morpheme marking

A natural language and recognition device technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as Chinese unregistered words, achieve good analysis results and improve analysis accuracy

Inactive Publication Date: 2007-03-28
FUJITSU LTD
View PDF2 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

So as to solve problems such as unregistered words in Chinese

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Natural language component identifying correcting apparatus and method based on morpheme marking
  • Natural language component identifying correcting apparatus and method based on morpheme marking
  • Natural language component identifying correcting apparatus and method based on morpheme marking

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0090]As shown in Figure 2, in the specific implementation of the natural language component recognition method and device based on morpheme attribute annotation, the modules (units) 102 and 104 are the main units constituting the morpheme learning part, and the morpheme attribute conversion part (unit) 102 applies morpheme The morpheme attribute set (module 109) generated by the attribute generation unit 108 converts the samples labeled with language components into morpheme attribute label samples, and module 104 learns the relationship between morphemes and morpheme attributes from the morpheme attribute label samples to form a morpheme attribute label. knowledge. Modules 112 and 113 constitute the identification part. Module 112 utilizes the morpheme attribute labeling knowledge learned in module 104, and under the guidance of manual summary or knowledge learned from labeled samples, morpheme attribute labeling is performed on the input symbol sequence. Module 113 uses morp...

Embodiment 2

[0119] In the specific implementation of a natural language component correction device and method based on morpheme tagging in the present invention, it includes: an input unit for receiving the symbol sequence to be corrected outputted from other natural language component recognition systems; a morpheme learning unit for Generate morpheme attributes according to the classification information of the natural language component to be corrected and the position of the morpheme in the natural language component, and learn the relationship law between the morpheme and the morpheme attribute from the labeled samples of the natural language component to be corrected; the error position The discovery part is used to check the input symbol sequence to be corrected to find the wrong position; the morpheme attribute labeling part is used to classify the input symbol sequence to be corrected according to the relationship law between the morpheme and the morpheme attribute learned by the ...

Embodiment 3

[0131] In a specific embodiment of the Chinese component recognition device and method based on character attribute annotation of the present invention, it includes: an input unit for inputting a Chinese text sequence to be analyzed; 1. The part-of-speech tagging corpus generates word attributes, and learns the relationship law between the word and the word attributes from the Chinese word segmentation and part-of-speech tagging samples to be analyzed; the word attribute tagging part is used to learn according to the word attribute learning part The obtained relation law between the word and the word attribute carries out word attribute labeling to the input Chinese text sequence to be analyzed, generates the word attribute labeling sequence; The required classification marks are identified in the tagging sequence, and the segmentation and part-of-speech tagging results of the input Chinese text sequence are generated; the output part is used to output the word and the result g...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The device includes parts: input part in use for inputting symbol sequence to be analyzed; semanteme learning part in use for creating attribute of semanteme, and obtaining rule of relation between semanteme and its attribute from learning labeled sample of natural language element to be analyzed; label part of attribute of semanteme in use for carrying out labeling attribute of semanteme for inputted symbol sequence; part for synthesizing analyzed elements in use for identifying natural language elements needed and sorted labels of natural language elements from labeled sequence of attribute of semanteme; output part in use for outputting identified result from the synthesizing part; the invention is in use for identifying needed language elements, or genetic factors etc. set of symbols, as well as labeling sorted attributes of identified elements from symbol sequence of inputted Chinese and Japanese for example so as to solve issues such as not logging in words in Chinese.

Description

technical field [0001] The present invention relates to the technology of identifying certain types of components from the input symbol sequence, in particular to the technology of identifying grammatical or semantic components of natural language based on morpheme attribute labeling and the technology of genome sequence analysis, specifically a morpheme-based labeling Natural language component identification and correction device and method. Background technique [0002] Language is a symbol system, and the basic symbols in the grammatical system are morphemes. Although natural language appears to be just a linear sequence of morphemes, it actually has a certain hierarchical structure, that is, morphemes form higher-level components, and then higher-level components form higher-level components, and finally form a hierarchical structure. Identifying the grammatical or semantic components contained in an input sentence and the relationship between components is the primary...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 孟遥于浩西野文人
Owner FUJITSU LTD