Network text named entity recognition method based on neural network probability disambiguation

A technology of named entity recognition and neural network, applied in the field of network text named entity recognition based on neural network probability disambiguation, can solve the problems of many typos, training neural network, irregular grammatical structure, etc., to achieve good practicability and high accuracy rate effect

Active Publication Date: 2017-09-26
CHINA UNIV OF MINING & TECH
View PDF7 Cites 55 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The reason why artificial neural networks are difficult to be practical is that artificial neural networks often need to convert words into vectors in the word vector space in the field of named entity recognition, so the corresponding vectors cannot be obtained for new vocabulary, so large-scale practical applications cannot be obtained
[0005] Based on the above status quo, the named entity recognition for network text mainly has the following problems: First, due to the large number of network vocabulary, new vocabulary, and typos in network text, it is impossible to train a word vector space containing all words to train the neural network.
Second, the arbitrary language forms, irregular grammatical structures, and many typos in network texts lead to a decline in the accuracy of named entity recognition.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Network text named entity recognition method based on neural network probability disambiguation
  • Network text named entity recognition method based on neural network probability disambiguation
  • Network text named entity recognition method based on neural network probability disambiguation

Examples

Experimental program
Comparison scheme
Effect test

example

[0040] Download the named entity corpus from the Sogou news website crawler network text as a sample corpus, use natural language tools to segment the crawler network text, and use the gensim package in python to pass the Word2Vec model through the corpus of good words and the sample corpus Carry out the training of the word vector space, the specific parameters are as follows, the length of the word vector is 200, the number of iterations is 25, the initial step size is 0.025, the minimum step size is 0.0001, and the CBOW model is selected.

[0041] The text of the sample corpus is converted into a word vector representing word features according to the trained Word2Vec model. If the Word2Vec model does not contain the corresponding training vocabulary, the method of incremental learning, obtaining word vectors, and backtracking the word vector space is used to convert the word Convert to word vectors. as a feature of each word. Convert " / o", " / n", " / p" and other tags in the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a network text named entity recognition method based on neural network probability disambiguation. The method includes: performing word segmentation on an unlabeled corpus, utilizing Word2Vec to extract a word vector, converting a sample corpus into a word feature matrix, windowing, building a deep neural network for training, adding a softmax function into an output layer of the neural network, and performing normalization to acquire a probability matrix of named entity type corresponding to each word; re-windowing the probability matrix, and utilizing a condition random field model for disambiguation to acquire final named entity annotation. A word vector increment learning method without changing structure of the neural network is provided according to the characteristic that network words and new words exist, and a probability disambiguation method is adopted to deal with the problem that network texts are nonstandard in grammatical structure and contain a lot of wrongly written or mispronounced characters, so that the method has high accuracy in network text named entity recognition tasks.

Description

technical field [0001] The invention relates to the processing and analysis of network texts, in particular to a method for network text named entity recognition based on neural network probability disambiguation. Background technique [0002] The network makes the speed and scale of information collection and dissemination reach an unprecedented level, and realizes global information sharing and interaction. It has become an indispensable infrastructure for the information society. Modern communication and dissemination technologies have greatly improved the speed and breadth of information dissemination. But the problems and "side effects" that come with it are: sometimes people are at a loss due to the surging information, and it becomes very difficult to quickly and accurately obtain the information they need most from the vast sea of ​​information. How to analyze named entities such as people, places, and institutions that Internet users are concerned about from massiv...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06N3/08
CPCG06N3/08G06F40/289G06F40/295
Inventor 周勇刘兵韩兆宇王重秋
Owner CHINA UNIV OF MINING & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products