Chinese named entity identification method based on BERT and SemiCRF

A named entity recognition, Chinese technology, applied in neural learning methods, instruments, biological neural network models, etc., can solve the problems of ignoring, inability to solve polysemy, word embeddings indicating limited learning quality, etc. meaning, the effect of ensuring accuracy

Pending Publication Date: 2020-08-21
SOUTH CHINA UNIV OF TECH
View PDF1 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The present invention can solve the problem of limited learning quality of word embedding representation and unsolvable pol

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese named entity identification method based on BERT and SemiCRF
  • Chinese named entity identification method based on BERT and SemiCRF
  • Chinese named entity identification method based on BERT and SemiCRF

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0042] like figure 1 Shown is a flow chart of a Chinese named entity recognition method based on BERT and SemiCRF, constructed as figure 2 Shown named entity recognition model, described model comprises BERT language model, two-way LSTM and CRF and SemiCRF joint module, described method described method comprises steps:

[0043] S1. Obtain a pre-trained BERT model;

[0044] Specifically, the acquisition method includes: downloading Google's open source BERT source code, using the BERT source code to use the existing pre-training technology on a massive unlabeled Chinese text corpus to obtain the BERT pre-training language model; or directly downloading Google's official pre-trained language model The Chinese BERT language model chinese_L-12_H-768_A-12.

[0045] S2. Preprocessing the original corpus data for named entity recognition, constructing a training set for named entity recognition, including steps:

[0046]S21. Perform routine data preprocessing on the original cor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Chinese named entity identification method based on BERT and SemiCRF to construct a named entity identification model, and the method comprises the following steps: obtaininga pre-trained BERT model; preprocessing the original corpus data of named entity identification, and constructing a training set of named entity identification; inputting the constructed training setdata of the named entity identification into a pre-trained BERT language model; sequentially inputting the output of the BERT language model into a bidirectional LSTM neural network and a CRF and SemiCRF joint module, and performing multiple iterative training on the bidirectional LSTM neural network and the joint module; and performing named entity identification on a Chinese text by using the trained complete named entity identification model. According to the method, the problem that the traditional word2vec cannot distinguish the polysemous words is solved, word-level information and character-level information which are often ignored by the traditional CRF method are combined through the introduced method based on the SemiCRF, and the Chinese named entity identification effect is improved to a certain extent.

Description

technical field [0001] The present invention relates to the technical field of named entity recognition, in particular to a Chinese named entity recognition method based on BERT and SemiCRF. Background technique [0002] Named Entity Recognition (NER) is a task in the field of Natural Language Processing (NLP), which aims to identify entities from text and classify them into predefined entity types, such as Names of people, places, organizations, etc. Named entity recognition can not only be used as a tool for information extraction alone, but also play an important role in other tasks and applications in the field of natural language processing, such as information retrieval, automatic text summarization, question answering, machine translation and knowledge base construction, etc. [0003] The existing mainstream method for named entity recognition is Bi-LSTM+CRF, in which Bi-LSTM (bidirectional long-short-term memory network) used is a very popular deep neural network in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/295G06F40/284G06N3/04G06N3/08
CPCG06F40/295G06F40/284G06N3/08G06N3/045G06N3/044
Inventor 蔡毅郑煜佳
Owner SOUTH CHINA UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products