HMM-based part-of-speech tagging method

A part-of-speech tagging and part-of-speech technology, applied in the field of information processing, to achieve the effect of improving efficiency and accuracy

Inactive Publication Date: 2018-03-16
KUNMING UNIV OF SCI & TECH
View PDF0 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The technical problem to be solved in the present invention is to provide a kind of part-of-speech tagging method based on HMM for the limitation and deficiency of prior art, introduce the combination of HMM and maximum entropy model to improve the part-of-s

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • HMM-based part-of-speech tagging method
  • HMM-based part-of-speech tagging method
  • HMM-based part-of-speech tagging method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0033] Embodiment 1: a kind of part-of-speech tagging method based on HMM, this method has introduced the combination of HMM and maximum entropy model and improves part-of-speech tagging method, specifically comprises following 5 steps:

[0034] ① Input the word string to be marked.

[0035] ② Use the thesaurus to segment the input word string to be tagged by the forward maximum matching method to obtain the first word segmentation result.

[0036] ③Using the corpus marked by People's Daily in January 1998 as the training set and test set, the three parameters of HMM are obtained, so as to obtain some observable states in HMM.

[0037] ④ Carry out the second word segmentation, and search the words not found in the first word segmentation results in several observable states in the HMM. If there are still unfound words, they will be introduced as new words into the maximum entropy model for labeling.

[0038] ⑤Use the viterbi algorithm to calculate the optimal hidden sequence ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an HMM-based part-of-speech tagging method and belongs to the field of information processing technology. According to the method, first, words in a word bank are ordered according to unicodes so that a dichotomy method can be used for quick search during word segmentation; second, an HMM is introduced, a tagged corpus serves as a training set and a test set to be used forobtaining three parameters of the HMM, and therefore a plurality of observable states in the HMM are obtained; third, secondary word segmentation is performed, the words not found in a primary word segmentation result are searched for in the observable states in the HMM, and a maximum entropy model is introduced to perform tagging on new words not found; and last, a viterbi algorithm is used to calculate an optimal hidden sequence of the HMM, and the optimal hidden sequence is combined with the tagging result of the maximum entropy model to obtain a final part-of-speech tagging result. Compared with the prior art, the phenomenon that a single part-of-speech tagging method is low in speed and low in new word recognition rate, and consequently a tagging result is low in accuracy is mainly solved, and the efficiency and accuracy of part-of-speech tagging are improved.

Description

technical field [0001] The invention relates to an HMM-based part-of-speech tagging method, which belongs to the technical field of information processing. Background technique [0002] In modern society, with the rapid development of information technology, part-of-speech tagging has become an important research direction in natural language processing. Necessary preparation for tasks such as machine translation. [0003] Generally, although the part-of-speech tagging effect based on HMM is good, it has insufficient prediction information and poor ability to identify new words, so that the accuracy of part-of-speech tagging is not high; similarly, although the part-of-speech tagging based on the maximum entropy model can effectively use the context Information has a good predictive effect, but there are problems such as slow labeling speed and labeling bias. Contents of the invention [0004] The technical problem to be solved in the present invention is to provide a ki...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/21
CPCG06F40/117
Inventor 龙华吴睿熊新邵玉斌杜庆治
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products