Named entity recognition model training method and named entity recognition method and device

A technology for named entity recognition and model training, which is applied to instruments, electrical digital data processing, computing, etc., and can solve problems that affect the recognition effect, are not applicable, and model features rely on manual work.

Active Publication Date: 2020-01-17
SUNING CLOUD COMPUTING CO LTD
View PDF3 Cites 30 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Specifically, rule-based named entity recognition has great limitations in dealing with new words, and it is not suitable for commodity label recognition with large-scale new words and polysemy scenarios.
The statistics-based recognition method is based on a statistical model, using a series of statistical theories and mathematical models for named entity recognition, but the extraction of model features and the selection of specific corpus are too dependent on manual work, which requires professional knowledge and engineering. Experience, the quality of the corpus will affect the final recognition effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Named entity recognition model training method and named entity recognition method and device
  • Named entity recognition model training method and named entity recognition method and device
  • Named entity recognition model training method and named entity recognition method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0077] figure 1 It is a schematic flow chart of a named entity recognition model training method provided by an embodiment of the present invention, refer to figure 1 As shown, the method may include the steps of:

[0078] Step 101, preprocessing the corpus samples to obtain character sequence samples, and labeling the character sequence samples with named entity labels to obtain training character sequences.

[0079] Specifically, word segmentation is performed on the corpus samples to obtain multiple word segmentations, and all individual characters are decomposed from multiple word segmentations to obtain character sequence samples; the character sequence samples are labeled with corresponding named entity labels according to the BMEO labeling rules to obtain training characters sequence.

[0080] In this embodiment, an open source word segmentation tool (such as a stuttering word segmentation tool) can be used to perform word segmentation processing on the corpus samples...

Embodiment 2

[0124] Based on the named entity recognition model trained in Embodiment 1, the embodiment of the present invention also provides a named entity recognition method. After the named entity recognition model is deployed as a service, the named entity recognition method can realize the text to be marked Quickly call the online named entity recognition model for named entity recognition.

[0125] refer to figure 2 As shown, the embodiment of the present invention provides a named entity recognition method, the method comprising:

[0126] Step 201, preprocessing the text to be marked to obtain a sequence of characters to be marked.

[0127] Specifically, word segmentation processing is performed on the text to be labeled to obtain multiple word segmentations, and all individual characters are decomposed from the multiple word segmentations to obtain a sequence of characters to be labeled.

[0128] In this embodiment, an open source word segmentation tool (such as a stammering wo...

Embodiment 3

[0152] image 3 is a schematic structural diagram of a named entity recognition model training device provided by an embodiment of the present invention, as image 3 As shown, the device includes:

[0153] The training data acquisition module 31 is used to preprocess the corpus samples to obtain character sequence samples, and mark the character sequence samples with named entity labels to obtain training character sequences;

[0154] The first pre-training module 32 is used to pre-train the training character sequence based on the preset first two-way language model and the first self-attention mechanism model, and obtain a character feature vector and a character weight vector corresponding to the training character sequence;

[0155] The second pre-training module 33 is used to pre-train the training character sequence based on the preset second two-way language model and the second self-attention mechanism model, and obtain a word feature vector and a word weight vector c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a named entity recognition model training method, a named entity recognition method and a named entity recognition device. The training method comprises the following steps: preprocessing a corpus sample to obtain a character sequence sample, and labeling a named entity label on the character sequence sample to obtain a training character sequence; pre-training the trainingcharacter sequence based on a first bidirectional language model and a first self-attention mechanism model to obtain a character feature vector and a character weight vector, and fusing the character feature vector and the character weight vector to obtain a second context vector; pre-training the training character sequence based on a second bidirectional language model and a second self-attention mechanism model to obtain a word feature vector and a word weight vector, and fusing the word feature vector and the word weight vector to obtain a second context vector; and training the bidirectional neural network and the conditional random field which are connected in sequence by using the first context vector and the second context vector to obtain a named entity recognition model. According to the method, the training effect of the named entity recognition model is effectively improved, and the named entity recognition accuracy is improved.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a named entity recognition model training method, a named entity recognition method and a device. Background technique [0002] Named Entities Recognition (NER) is a basic task of Natural Language Processing (NLP). Its purpose is to recognize the names of people, places, organizations or named entities in the input text according to specific needs, such as' The different meanings of "Apple" in "Red Fuji Apple" and "Apple 256G memory phone", the former "Apple" belongs to the product, and the latter "Apple" belongs to the brand. [0003] At present, named entity recognition technology can be roughly divided into rule-based and statistical-based methods, both of which have certain defects. Specifically, rule-based named entity recognition has great limitations in dealing with new words, and is not suitable for commodity label recognition with large-scale new words and pol...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295
CPCY02T10/40
Inventor 黎旭关超伟左赛赵楠徐详朕
Owner SUNING CLOUD COMPUTING CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products