Natural language component identifying correcting apparatus and method based on morpheme marking
A natural language and recognition device technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as Chinese unregistered words, achieve good analysis results and improve analysis accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0090]As shown in Figure 2, in the specific implementation of the natural language component recognition method and device based on morpheme attribute annotation, the modules (units) 102 and 104 are the main units constituting the morpheme learning part, and the morpheme attribute conversion part (unit) 102 applies morpheme The morpheme attribute set (module 109) generated by the attribute generation unit 108 converts the samples labeled with language components into morpheme attribute label samples, and module 104 learns the relationship between morphemes and morpheme attributes from the morpheme attribute label samples to form a morpheme attribute label. knowledge. Modules 112 and 113 constitute the identification part. Module 112 utilizes the morpheme attribute labeling knowledge learned in module 104, and under the guidance of manual summary or knowledge learned from labeled samples, morpheme attribute labeling is performed on the input symbol sequence. Module 113 uses morp...
Embodiment 2
[0119] In the specific implementation of a natural language component correction device and method based on morpheme tagging in the present invention, it includes: an input unit for receiving the symbol sequence to be corrected outputted from other natural language component recognition systems; a morpheme learning unit for Generate morpheme attributes according to the classification information of the natural language component to be corrected and the position of the morpheme in the natural language component, and learn the relationship law between the morpheme and the morpheme attribute from the labeled samples of the natural language component to be corrected; the error position The discovery part is used to check the input symbol sequence to be corrected to find the wrong position; the morpheme attribute labeling part is used to classify the input symbol sequence to be corrected according to the relationship law between the morpheme and the morpheme attribute learned by the ...
Embodiment 3
[0131] In a specific embodiment of the Chinese component recognition device and method based on character attribute annotation of the present invention, it includes: an input unit for inputting a Chinese text sequence to be analyzed; 1. The part-of-speech tagging corpus generates word attributes, and learns the relationship law between the word and the word attributes from the Chinese word segmentation and part-of-speech tagging samples to be analyzed; the word attribute tagging part is used to learn according to the word attribute learning part The obtained relation law between the word and the word attribute carries out word attribute labeling to the input Chinese text sequence to be analyzed, generates the word attribute labeling sequence; The required classification marks are identified in the tagging sequence, and the segmentation and part-of-speech tagging results of the input Chinese text sequence are generated; the output part is used to output the word and the result g...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 