Text-based entity recognition method and related device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of entity recognition and text, which is applied in the field of entity recognition, can solve problems such as long calculation time, unreliable feature selection, and low recognition accuracy, and achieve the effect of small actual calculation amount, improved accuracy rate, and enhanced representation ability

Pending Publication Date: 2020-11-17

GUANGDONG UNIV OF TECH

View PDF1 Cites 6 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] This application provides a text-based entity recognition method and related devices, which are used to solve the technical problems of long calculation time, unreliable feature selection and low recognition accuracy in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0050] For ease of understanding, see figure 1 , Embodiment 1 of a text-based entity recognition method provided by the present application, including:

[0051] Step 101: Map the preset word data set into a word feature vector set through the first preset Word2Vec model, and the word feature vector set includes the word feature vector.

[0052]The first preset Word2Vec model can be regarded as a word vector model, and it is an unsupervised model. According to the input word data set, the word vector is learned, or the word data set is mapped to a word vector set. The specific processing process is actually Words are randomly initialized as vectors of several dimensions, and text information is converted into digital information; Word vectors with the same semantic meaning are similar, and word vectors with different semantic meanings are different through word learning in documents. The output dimension of the Word2Vec model can be set according to the actual situation.

[0...

Embodiment 2

[0064] For ease of understanding, see figure 2 , the present application provides a second embodiment of a text-based entity recognition method, including:

[0065] Step 201, using a crawler to obtain a large amount of text data to form an initial text data set.

[0066] Step 202: Filter the initial text data set by using a preset Dirichlet topic model to obtain a filtered text data set.

[0067] Use crawlers to obtain a large amount of text data, and the initial text data set is denoted as T 1 , process the initial text dataset T by presetting the Dirichlet topic model 1 , each text acquires 5 topics, and judges whether the 5 topics contain keywords for future descriptions, which is convenient for predicting and identifying future named entities, and if there are, it will be filtered as a reserved text data set T 2 , otherwise the text data is discarded.

[0068] Step 203 , using a preset word segmentation tool to sequentially perform trigger word type screening and synt...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to view more

PUM

Login to view more

Abstract

The invention discloses a text-based entity recognition method and a related device. The method comprises the steps of mapping a preset word data set into a word feature vector set through a first preset Word2Vec model; extracting a context feature vector of the preset word data set by adopting a preset BiLSTM model to form a context feature vector set; mapping the preset part-of-speech data set into a part-of-speech feature vector set through a second preset Word2Vec model; splicing the word feature vector, the context feature vector and the part-of-speech feature vector into a fusion featurevector; processing the preset edge matrix data set and the fusion feature vector set by adopting a preset convolutional neural network model to obtain a word label probability matrix; and processingthe word label probability matrix by adopting a preset CRF model to obtain an identification result of the named entity. According to the method, the technical problems of long calculation time, unreliable feature selection and low identification accuracy in the prior art can be solved.

Description

technical field [0001] The present application relates to the technical field of entity recognition, in particular to a text-based entity recognition method and related devices. Background technique [0002] Named entity recognition plays a very important role in natural language processing. It is the basis for information extraction, information retrieval, machine translation and question answering system. The main task of named entity recognition is to identify similar names in text and institutions and other proper words, and classify them. [0003] The feature extraction of the existing named entity recognition method is greatly influenced by human beings, and the influence of the time factor is not considered, which leads to the low accuracy of named entity recognition. In addition, some deep loop networks have a very large amount of calculation. It takes a lot of time to complete the calculation. Contents of the invention [0004] The present application provides a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to view more

Application Information

Patent Timeline

Login to view more

Patent Type & Authority Applications(China)

IPC IPC(8): G06F40/295G06F16/33G06F16/35G06N3/04G06N3/08

CPCG06F40/295G06F16/3344G06F16/35G06N3/049G06N3/08G06N3/045Y02D10/00

Inventor 左亚尧洪嘉伟陈致然

Owner GUANGDONG UNIV OF TECH

Who we serve

R&D Engineer
R&D Manager
IP Professional

Why Eureka

Industry Leading Data Capabilities
Powerful AI technology
Patent DNA Extraction

Social media

Try Eureka

PatSnap group products

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic.

Text-based entity recognition method and related device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology