Method for realizing Chinese named entity identification by utilizing uncertain word segmentation information

A named entity recognition and word segmentation technology, applied in the field of natural language processing, can solve the problems of word segmentation information entity recognition disturbance, noise, increase model training costs, etc., and achieve the effect of making up for the lack of context semantics, reducing word segmentation errors, and improving fault tolerance

Active Publication Date: 2020-06-19
TONGJI UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The character sequence is tagged at the character level, and word segmentation information is added to the tagging system in the form of feature vectors. However, word segmentation error information will also be introduced. Even if the named entity and word segmentation model are trained at the same time, the word segmentation error information will still flow into the named entity. system, generating noise or errors, this multi-task joint learning will undoubtedly increase the overhead of model training
In summary, these methods have a common negligence when using word segmentation information, that is, they all introduce the correct word segmentation information into the entity recognition system or module. No matter whether the information is completely correct or not, the wrong word segmentation information will inevitably give Entity Recognition Brings Negative Disruption

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for realizing Chinese named entity identification by utilizing uncertain word segmentation information
  • Method for realizing Chinese named entity identification by utilizing uncertain word segmentation information
  • Method for realizing Chinese named entity identification by utilizing uncertain word segmentation information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0068] 1.1 Input the Chinese text "Nanjing Yangtze River Bridge Research", and get the character sequence ['South', 'Beijing', 'City', 'Chang', 'Jiang', 'Da', 'Qiao', 'Tune', 'Research' '], the number of characters is 9, using the Word2vec method for pre-training, and each character gets a 100-dimensional character vector;

[0069] 1.2 Input the character sequence described in 1.1 into the jieba word segmentation model to obtain all candidate word segmentation information ['Nanjing', 'Nanjing City', 'Beijing City', 'Mayor', 'Yangtze River', 'Yangtze River Bridge', 'Jiang' , 'Bridge', 'Investigation'], according to the location information of whether each character appears in the word segmentation, the character candidate word segmentation position vector with a dimension of 4 is obtained, and the vector group is obtained:

[0070]

[0071] 1.3 Multiply the 4-dimensional character candidate word segmentation position vector described in 1.2 by multiplying the 4×100-dimension...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method for realizing Chinese named entity identification by utilizing uncertain word segmentation information. The invention aims to prevent word segmentation errors from being introduced into an identification system while enabling word segmentation information to play a role in the identification system. According to the method, a Chinese named entity recognition modelis realized by utilizing uncertain word segmentation information, the uncertain word segmentation information comprises all word segmentation conditions rather than one single condition, character candidate word segmentation position information is coded, and uncertain information of word segmentation is integrated by adopting a dynamic attention mechanism. In the recognition process, the model dynamically selects beneficial word segmentation information and automatically ignores error information, and finally an optimal word segmentation result is obtained. Compared with the prior art, the method has the advantages of effectively relieving error cascading, enhancing character vector semantic expression, being low in word segmentation error rate and the like.

Description

technical field [0001] The invention relates to the technical field of natural language processing (NLP), relates to a method for Chinese named entity recognition (NER), in particular to a method for realizing Chinese named entity recognition (UIcwsNN) by using uncertain word segmentation information. Background technique [0002] Named entity recognition is a fundamental task in the field of NLP and has rich underlying applications. However, compared with English, Chinese sentences have no delimiters, that is, Chinese text is a sequence of words, and words cannot be directly distinguished from each other, and word-level information is very important for named entity recognition. Existing word segmentation tools will output a large number of wrong word segmentation results, which makes named entity recognition difficult to achieve and the recognition effect is not ideal. [0003] The existing Chinese named entity recognition methods usually regard it as a character sequence...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F40/289G06N3/04
CPCG06N3/045Y02D10/00
Inventor 向阳贾圣宾徐忠国
Owner TONGJI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products