Text-based entity recognition method and related device

A technology of entity recognition and text, which is applied in the field of entity recognition, can solve problems such as long calculation time, unreliable feature selection, and low recognition accuracy, and achieve the effect of small actual calculation amount, improved accuracy rate, and enhanced representation ability

Pending Publication Date: 2020-11-17
GUANGDONG UNIV OF TECH
View PDF1 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] This application provides a text-based entity recognition method and related devices, which are used to solve the technical problems of long calculation time, unreliable feature selection and low recognition accuracy in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text-based entity recognition method and related device
  • Text-based entity recognition method and related device
  • Text-based entity recognition method and related device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0050] For ease of understanding, see figure 1 , Embodiment 1 of a text-based entity recognition method provided by the present application, including:

[0051] Step 101: Map the preset word data set into a word feature vector set through the first preset Word2Vec model, and the word feature vector set includes the word feature vector.

[0052]The first preset Word2Vec model can be regarded as a word vector model, and it is an unsupervised model. According to the input word data set, the word vector is learned, or the word data set is mapped to a word vector set. The specific processing process is actually Words are randomly initialized as vectors of several dimensions, and text information is converted into digital information; Word vectors with the same semantic meaning are similar, and word vectors with different semantic meanings are different through word learning in documents. The output dimension of the Word2Vec model can be set according to the actual situation.

[0...

Embodiment 2

[0064] For ease of understanding, see figure 2 , the present application provides a second embodiment of a text-based entity recognition method, including:

[0065] Step 201, using a crawler to obtain a large amount of text data to form an initial text data set.

[0066] Step 202: Filter the initial text data set by using a preset Dirichlet topic model to obtain a filtered text data set.

[0067] Use crawlers to obtain a large amount of text data, and the initial text data set is denoted as T 1 , process the initial text dataset T by presetting the Dirichlet topic model 1 , each text acquires 5 topics, and judges whether the 5 topics contain keywords for future descriptions, which is convenient for predicting and identifying future named entities, and if there are, it will be filtered as a reserved text data set T 2 , otherwise the text data is discarded.

[0068] Step 203 , using a preset word segmentation tool to sequentially perform trigger word type screening and synt...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text-based entity recognition method and a related device. The method comprises the steps of mapping a preset word data set into a word feature vector set through a first preset Word2Vec model; extracting a context feature vector of the preset word data set by adopting a preset BiLSTM model to form a context feature vector set; mapping the preset part-of-speech data set into a part-of-speech feature vector set through a second preset Word2Vec model; splicing the word feature vector, the context feature vector and the part-of-speech feature vector into a fusion featurevector; processing the preset edge matrix data set and the fusion feature vector set by adopting a preset convolutional neural network model to obtain a word label probability matrix; and processingthe word label probability matrix by adopting a preset CRF model to obtain an identification result of the named entity. According to the method, the technical problems of long calculation time, unreliable feature selection and low identification accuracy in the prior art can be solved.

Description

technical field [0001] The present application relates to the technical field of entity recognition, in particular to a text-based entity recognition method and related devices. Background technique [0002] Named entity recognition plays a very important role in natural language processing. It is the basis for information extraction, information retrieval, machine translation and question answering system. The main task of named entity recognition is to identify similar names in text and institutions and other proper words, and classify them. [0003] The feature extraction of the existing named entity recognition method is greatly influenced by human beings, and the influence of the time factor is not considered, which leads to the low accuracy of named entity recognition. In addition, some deep loop networks have a very large amount of calculation. It takes a lot of time to complete the calculation. Contents of the invention [0004] The present application provides a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F16/33G06F16/35G06N3/04G06N3/08
CPCG06F40/295G06F16/3344G06F16/35G06N3/049G06N3/08G06N3/045Y02D10/00
Inventor 左亚尧洪嘉伟陈致然
Owner GUANGDONG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products