Named entity identification method and system for legal instrument multi-strategy fusion

A named entity recognition, multi-strategy technology, applied in the direction of instruments, electrical digital data processing, computing, etc., can solve the problems that are difficult to meet the requirements of named entity recognition, so as to improve the accuracy rate and recall rate, reduce dependence, and reduce the burden Effect

Active Publication Date: 2020-02-18
SOUTH CHINA NORMAL UNIVERSITY
View PDF4 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in the face of fewer labeled datasets, it is still difficult to meet the requirements of named entity recognition.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Named entity identification method and system for legal instrument multi-strategy fusion
  • Named entity identification method and system for legal instrument multi-strategy fusion
  • Named entity identification method and system for legal instrument multi-strategy fusion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0049] like figure 1 As shown, this embodiment provides a named entity recognition method for multi-strategy fusion of legal documents, including the following steps:

[0050] S1: Establish a source data corpus, obtain news data or social data from the People’s Daily (data with large-scale tagged data, the closer to the target data type, the better), and perform part-of-speech tagging and sequence tagging on the source data corpus. This data is used for the model For pre-training, this embodiment first uses a data set different from the formal training for pre-training, and trains a similar model;

[0051]S2: Use BiLSTM-Attention-CRF model to train a large number of labeled data sets of People's Daily to obtain the trained first model. The training method of this embodiment adopts BiLSTM-Attention-CRF method, and its main implementation methods are: First, the text data is vectorized, that is, the text data is converted into a matrix, and then the matrix is ​​input into the B...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a named entity recognition method and system for legal document multi-strategy fusion, and the method comprises the following steps: building a source data corpus, carrying outthe part-of-speech tagging and sequence tagging of the source data corpus, and carrying out the model pre-training; training the labeled data through a BiLSTM-Attention-CRF (Bipolar Long Short Term Memory-Attention-Content Random Field) model to obtain a trained first model; improving the trained first model; establishing a target data corpus, randomly extracting data from the target data of thelegal instrument, and generating a plurality of training sets; carrying out transfer learning on the plurality of training sets, and training the improved first model to obtain models trained by the plurality of training sets; and integrating the models trained by the plurality of training sets by adopting a voting mechanism in ensemble learning to obtain a second model, and performing named entity identification of legal documents by the second model to obtain a final named entity identification result. According to the method, the accuracy and recall rate of named entity recognition are improved under the condition of insufficient annotation corpora.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a named entity recognition method and system for multi-strategy fusion of legal documents. Background technique [0002] Named entities are people's names, organization names, place names, and all other entities identified by names. They are the basic information elements in the text, an important carrier of information expression, and the basis for correct understanding and processing of text information. Chinese named entity recognition is one of the basic tasks in the field of natural language processing. Its main task is to identify and classify name entities and meaningful phrases appearing in the text, mainly including person names, place names, organization names, and time expressions. Formulas, dates, digital expressions, etc., the accuracy and recall of named entity recognition directly determine the performance of the whole process of language unders...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F40/253G06N20/20
CPCG06N20/20Y02D10/00
Inventor 陈振洲高磊
Owner SOUTH CHINA NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products