Recognition ambiguity resolution method of Chinese named entity

A named entity recognition and ambiguity resolution technology, which is applied in the field of named entity extraction, can solve problems such as costing a lot of time and storage space, and being difficult to transplant

Inactive Publication Date: 2012-01-11
BEIHANG UNIV
View PDF6 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Rule-based methods require manual participation, are domain-specific, and are difficult to transplant; while statistical-based methods are more adapta

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Recognition ambiguity resolution method of Chinese named entity
  • Recognition ambiguity resolution method of Chinese named entity
  • Recognition ambiguity resolution method of Chinese named entity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038]In one embodiment of the present invention, the training corpus used comes from MSRA Bakeoff 2003, and five feature templates are designed. During the experiment, not only the position information of multiple words in the context, but also the type information of the context words are used to enrich the feature templates the information contained. The training corpus has a total of 46364 sentences, of which the total number of named entities is 75059, which means that each sentence has an average of 1.6 named entities. The test corpus has a total of 4365 sentences, of which the total number of named entities is 6190, with an average of 1.4 named entities per sentence. The corpus is stored as an XML document. During the experiment, the CPU of the computer is Core 2 Duo, the main frequency is 2.4GHZ, the memory is 1G, the operating system is RedHat 9.0Linux, and the CRF model training tool used is CRF++-0.54.

[0039] The feature templates used during training are shown ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

A recognition ambiguity resolution method of a Chinese named entity belongs to the field of named entity extraction; the method comprises the following steps: (1) the invention adopts a feature induction to reduce a feature number, i.e., a classifier provided by the invention automatically selects a meaningful feature through training and learning; (2) after selecting the feature, an N-BEST best marking sequence is selected from the CRF (Conditional Random Field) model learning through a Viterbi algorithm, i.e., N marking sequences with maximum probability of an observation sequence are selected; (3) in consideration of an occurrence frequency and a word length of the Chinese named entity, the method disclosed by the invention adopts a modified greedy algorithm to implement the ambiguity resolution, thus the entity marking sequence is obtained.

Description

Technical field: [0001] The invention belongs to the field of named entity extraction, in particular to a method for disambiguating Chinese named entity recognition. Background technique: [0002] How to quickly and effectively find the knowledge users need to solve problems in view of the massive amount of information on the WEB. With the continuous development of Internet technology, the Internet has become an important source of information. In the face of massive WEB information, people still face the dilemma of lack of knowledge. [0003] At present, the vast majority of web pages are written in HTML, and the tags in HTML only describe the expression of data, and do not describe the semantic information of data, which makes it difficult for computers to understand information on the Web. Faced with massive amounts of information, people can only search based on keywords through tools such as search engines, and search engines return various query results in the form o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 王理潘守慧邓卫国王思远于珊施慧斌
Owner BEIHANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products