Court electronic file oriented case information automatic extraction method

An automatic extraction and information extraction technology, applied in the field of artificial intelligence text information extraction, can solve problems such as poor normalization ability, large differences in the recognition performance of different named entities, and large impact on the training effect of language representation semantic granularity models.

Active Publication Date: 2021-02-26
TAIJI COMP
View PDF4 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Information extraction is the process of obtaining structured data from unstructured text. Structured data mainly includes four data types: entity, relationship, attribute, and event. For a court case file, the case information is composed of The semantic network formed by the above four elements, the process of extracting information from electronic files is also the process of constructing case knowledge ontology. This process usually includes four links: symbolic annotation, syntactic analysis, extraction, alignment and merging. The research results of foreign text information extraction methods are mainly divided into three categories: method based on rule template; method based on statistical machine learning; Analysis and other natural language processing technologies, writing rule templates, and extracting matching pattern information. Later, researchers used unsupervised methods to automatically generate new rules to expand the template library, and made some progress. The writing of rules is extremely dependent on domain knowledge, which makes rule-based The template method has a high extraction accuracy for specific corpus, but it is often unable to be transplanted across domains and has poor generalization ability
[0004] The method based on statistical machine learning is to solve the problem of information extraction as a sequence labeling problem. It is mainly divided into traditional machine learning methods and deep learning methods. Currently, the commonly used traditional machine learning methods are: hidden Markov model (Hidden Markov model) Model, HMM), maximum entropy Markov model (Maximum Entropy Markov Model, MEMM), and conditional random field model (Conditional Random Field Model, CRF), these three models are all through the known observation sequence X, to solve The optimal labeling sequence Y, the difference is that the assumptions and constraints of the three dependencies are reduced in turn, and CRF is more widely used to solve the sequence of On the issue of labeling, CRF training needs to set feature values ​​based on experience, and it completely relies on the training lexicon for entity judgment. Compared with methods based on statistical machine learning, the deep neural network model has the advantages of better generalization and less dependence on artificial features, and has been widely used in named entity recognition in various fields, but for legal document named entity recognition The main problem is that the length difference between different named entities is large, which makes the semantic granularity of language representation have a greater impact on the model training effect, resulting in a large difference in the recognition performance of different named entities. In the dossier documents of different types of cases, naming The contextual characteristics of entities are significantly different, resulting in poor robustness of the model applied to different types of case files. Training deep learning models requires a large amount of labeled corpus, but the current legal document annotation corpus is insufficient

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Court electronic file oriented case information automatic extraction method
  • Court electronic file oriented case information automatic extraction method
  • Court electronic file oriented case information automatic extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0075] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. All other embodiments obtained by persons of ordinary skill in the art based on the embodiments of the present invention belong to the protection scope of the present invention.

[0076] like Figure 1-4 As shown, according to the method for automatically extracting case information for court electronic files described in the embodiment of the present invention, the method includes:

[0077] Step 1: Create a case information extraction framework;

[0078] Step 2: Build and train NER models based on multi-granularity semantic legal documents.

[0079] In a specific embodiment of the present invention,

[0080] Step 1: Create a case information extraction framework; ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a court electronic file-oriented case information automatic extraction method, which comprises the steps of creating a case information extraction framework, performing statistical analysis on case elements in various case electronic file files, and constructing a basic case body; customizing a basic case information extraction strategy according to the case ontology and the layout characteristics of the electronic file; constructing a case information extraction framework based on a cross-domain information extraction tool TZIE of XML; building an NER model and a training and multi-granularity semantic unit combination mode based on multi-granularity semantic legal documents, employing a Word2vec Skip-gram model and an LDA topic model, obtaining a judicial domain word vector model and a judicial domain word vector model through training respectively, combining domain entity recognition of a BiLSTM-Attention CRF model, adding an Attentio mechanism in a referencemodel BiLSTM-CRF, predicting the probability that each semantic unit belongs to different labels, assisting an optimized model training mode, taking a manual annotation corpus training model as a main task, and taking manual and automatic annotation corpus training models as auxiliary tasks.

Description

technical field [0001] The invention relates to the technical field of artificial intelligence text information extraction, in particular to an automatic extraction method for court electronic case information. Background technique [0002] At present, courts at all levels have accumulated large-scale electronic case files in the process of informatization. For the unstructured text information recorded in massive electronic files, there is an urgent need to use more efficient methods for structured and knowledge-based processing, which provides the depth of court electronic files. Application lays the groundwork. [0003] Information extraction is the process of obtaining structured data from unstructured text. Structured data mainly includes four data types: entity, relationship, attribute, and event. For a court case file, the case information is composed of The semantic network formed by the above four elements, the process of extracting information from electronic file...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F40/216G06F40/242G06F40/30G06F16/335G06Q50/18
CPCG06F40/295G06F40/216G06F40/242G06F40/30G06F16/335G06Q50/18Y02D10/00
Inventor 万玉晴王霄
Owner TAIJI COMP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products