Tour field named entity recognition method based on condition random field

A named entity recognition, conditional random field technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as adverse effects

Inactive Publication Date: 2009-07-08
KUNMING UNIV OF SCI & TECH
View PDF0 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, since the Hidden Markov Model is a generative model, it has some inherent defects and deficiencies: First, in order to ensure the correctness of the derivation, strict independence assumptions need to be made
Annotation bias problem can adversely affect the results of natural language processing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Tour field named entity recognition method based on condition random field
  • Tour field named entity recognition method based on condition random field
  • Tour field named entity recognition method based on condition random field

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] This embodiment takes the field of Yunnan tourism training and testing as the text, and collects and sorts it out. Experimental verification was carried out in the field of Yunnan tourism for the proposed method above. The specific steps are as follows: figure 1 shown.

[0040] Step a1. Manually collect 2000 Yunnan tourism document corpus. Among them, 800 articles are training corpus, and 1200 articles are open test corpus. 600 of the 800 training documents are randomly selected as the closed test corpus. The so-called training corpus refers to the text provided to the CRF++0.49 toolkit to extract context features from it. The training model refers to the collection of contextual features extracted from the training corpus. The test corpus refers to the corpus used to verify the performance of the training model. That is, unmarked text. Open testing means that the training corpus and test corpus are not repeated. Closed testing means that the test corpus is part...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method for identifying named entities in the field of traveling, belongs to the artificial intelligence field. Based on language material collection, labeling and text pretreatment conducted manually, the invention provides a method for identifying named entities in the field of traveling based on stack condition random field model. The method comprises the following two steps: in the low-grade condition random field, word used as dividing granularity is combined with characteristic dictionaries such as a tourist attraction common word table, a tourist attraction common suffix table, a place name common word table, and the like, so as to realize simple traveling name entity identification through establishing an effective characteristic molding board; the identification result is transmitted to the high-grade model, in the high-grade model, phrase used as dividing granularity is combined with a molding board with a difficult characteristic module, so as to realize the identification of nesting scenic spots, local products and delicacies, and locations. In open tests, compared with a monolayer model, the F value of the stack condition random field model increases by 8 percent; and compared with an HMM (Hidden Markov Model) model, the correct rate of the stack condition random field model increases by 8 percent, the recall rate thereof increases by 22 percent, and the F value thereof increases by 15 percent.

Description

technical field [0001] The invention relates to a named entity recognition method in the field of tourism based on a conditional random field, which belongs to the field of artificial intelligence. Background technique [0002] At present, in the field of named entity recognition, some rule methods and statistical methods are mainly used. In the rule-based method, due to the variety of rules, the cost of summarizing a unified rule that can identify all types is too high, and the rules cannot cover all fields, so it is basically infeasible. The Institute of Computing Technology, Chinese Academy of Sciences proposed a statistical method for character labeling and recognition entities based on Hidden Markov Model. However, since the hidden Markov model is a generative model, it has some inherent defects and deficiencies: First, in order to ensure the correctness of the derivation, strict independence assumptions need to be made. In fact, most sequence data cannot be represent...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 郭剑毅薛征山余正涛张志坤毛存礼万舟
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products