Disambiguation-free unsupervised part-of-speech tagging method based on error-correcting output codes

A technology for outputting coding and supervised words, applied in the field of information processing, can solve problems such as difficult part-of-speech tagging, and achieve the effect of avoiding the problem of error propagation

Active Publication Date: 2016-09-21
SOUTHEAST UNIV
View PDF3 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] Purpose of the invention: In order to overcome the deficiencies in the prior art, the present invention provides an unsupervised part-of-speech tagging method based on error-correcting output coding without disambiguation, which can solve the problem of part-of-speech tagging in languages ​​where it is difficult to obtain tagged corpus Framework, to achieve part-of-speech tagging, and to facilitate the development of a series of subsequent applications (such as named entity recognition, information extraction)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Disambiguation-free unsupervised part-of-speech tagging method based on error-correcting output codes
  • Disambiguation-free unsupervised part-of-speech tagging method based on error-correcting output codes
  • Disambiguation-free unsupervised part-of-speech tagging method based on error-correcting output codes

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] Below in conjunction with accompanying drawing and specific embodiment, further illustrate the present invention, should be understood that these examples are only for illustrating the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various aspects of the present invention All modifications of the valence form fall within the scope defined by the appended claims of the present application.

[0030] An unsupervised part-of-speech tagging method without disambiguation based on error-correcting output coding, such as figure 1 As shown, the method includes two steps: generating training data based on part-of-speech dictionary and training and testing based on ECOC.

[0031]The problem can be described as follows: use O to represent the list of part-of-speech tags, and D to represent the dictionary composed of words and their candidate parts of speech, that i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a disambiguation-free unsupervised part-of-speech tagging method based on error-correcting output codes. The disambiguation-free unsupervised part-of-speech tagging method comprises two main steps: (1) on the basis of a part-of-speech directory, generating training data; and (2) on the basis of the error-correcting output codes, carrying out training and testing. The disambiguation-free unsupervised part-of-speech tagging method does not need to tag corpora, can be applied to a part-of-speech tagging problem of a language which can not easily obtain the tagged corpus, and does not need to carry out disambiguation so as to avoid an error propagation problem in an iterative disambiguation process. A neuro-linguistic model is adopted to automatically generate characteristics used by training and testing so as to avoid the manual selection and construction of characteristics.

Description

technical field [0001] The invention relates to a method for marking part of speech of text by using a computer, and belongs to the technical field of information processing. Background technique [0002] At present, no unsupervised part-of-speech tagging method based on Error-Correcting Output Codes (ECOC) and automatic generation of training and testing features has been found, but there is an unsupervised part-of-speech tagging method for manually generating training and testing features. There are also supervised part-of-speech tagging methods based on automatic generation of training and test features, and this method is completely different from these methods. [0003] Part-of-Speech tagging or POS tagging, also known as part-of-speech tagging or tagging for short, refers to marking a correct part of speech for each word in a sentence, that is, to determine whether each word is a noun, verb, adjective or other Part of speech process. Correct part-of-speech tagging is...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/205G06F40/253
Inventor 周德宇徐海洋张致恺
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products