Network text named entity recognition method based on neural network probability disambiguation

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of named entity recognition and neural network, applied in the field of network text named entity recognition based on neural network probability disambiguation, can solve the problems of many typos, training neural network, irregular grammatical structure, etc., to achieve good practicability and high accuracy rate effect

Active Publication Date: 2017-09-26

CHINA UNIV OF MINING & TECH

View PDF7 Cites 55 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The reason why artificial neural networks are difficult to be practical is that artificial neural networks often need to convert words into vectors in the word vector space in the field of named entity recognition, so the corresponding vectors cannot be obtained for new vocabulary, so large-scale practical applications cannot be obtained

[0005] Based on the above status quo, the named entity recognition for network text mainly has the following problems: First, due to the large number of network vocabulary, new vocabulary, and typos in network text, it is impossible to train a word vector space containing all words to train the neural network.

Second, the arbitrary language forms, irregular grammatical structures, and many typos in network texts lead to a decline in the accuracy of named entity recognition.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

example

[0040] Download the named entity corpus from the Sogou news website crawler network text as a sample corpus, use natural language tools to segment the crawler network text, and use the gensim package in python to pass the Word2Vec model through the corpus of good words and the sample corpus Carry out the training of the word vector space, the specific parameters are as follows, the length of the word vector is 200, the number of iterations is 25, the initial step size is 0.025, the minimum step size is 0.0001, and the CBOW model is selected.

[0041] The text of the sample corpus is converted into a word vector representing word features according to the trained Word2Vec model. If the Word2Vec model does not contain the corresponding training vocabulary, the method of incremental learning, obtaining word vectors, and backtracking the word vector space is used to convert the word Convert to word vectors. as a feature of each word. Convert " / o", " / n", " / p" and other tags in the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a network text named entity recognition method based on neural network probability disambiguation. The method includes: performing word segmentation on an unlabeled corpus, utilizing Word2Vec to extract a word vector, converting a sample corpus into a word feature matrix, windowing, building a deep neural network for training, adding a softmax function into an output layer of the neural network, and performing normalization to acquire a probability matrix of named entity type corresponding to each word; re-windowing the probability matrix, and utilizing a condition random field model for disambiguation to acquire final named entity annotation. A word vector increment learning method without changing structure of the neural network is provided according to the characteristic that network words and new words exist, and a probability disambiguation method is adopted to deal with the problem that network texts are nonstandard in grammatical structure and contain a lot of wrongly written or mispronounced characters, so that the method has high accuracy in network text named entity recognition tasks.

Description

technical field [0001] The invention relates to the processing and analysis of network texts, in particular to a method for network text named entity recognition based on neural network probability disambiguation. Background technique [0002] The network makes the speed and scale of information collection and dissemination reach an unprecedented level, and realizes global information sharing and interaction. It has become an indispensable infrastructure for the information society. Modern communication and dissemination technologies have greatly improved the speed and breadth of information dissemination. But the problems and "side effects" that come with it are: sometimes people are at a loss due to the surging information, and it becomes very difficult to quickly and accurately obtain the information they need most from the vast sea of information. How to analyze named entities such as people, places, and institutions that Internet users are concerned about from massiv...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/27G06N3/08

CPCG06N3/08G06F40/289G06F40/295

Inventor 周勇刘兵韩兆宇王重秋

Owner CHINA UNIV OF MINING & TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Network text named entity recognition method based on neural network probability disambiguation

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

example

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology