Judgment document information extraction method

An information extraction and document technology, which is applied to metadata text retrieval, text database clustering/classification, unstructured text data retrieval, etc. and other problems to achieve high efficiency and improve the effect of the model

Active Publication Date: 2020-01-14
杭州费尔斯通科技有限公司
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If it is necessary to further excavate and utilize these public judgment document information, it is necessary to carry out structured processing of each core field of the case, which is usually done by manual operation, and the common manual processing is obviously insufficient in both cost and efficiency
[0003] CN201910263217 uses the neural network model to perform named entity recognition on legal documents, extracts key information in legal documents, and proposes a named entity recognition method for legal documents, but cannot identify the semantic relationship between entities. For example, there are multiple The defendant and multiple defendants are sentenced to crimes. This method cannot determine the specific crimes of a certain defendant; CN201910145396 first performs TF-IDF word frequency statistics on unstructured texts to obtain feature sets of different crimes and causes of action, and then compares entities Although it involves the extraction of semantic relations between entities, the method of generating candidate entities is very dependent on corpus, and two entities need to be paired to generate samples, which is relatively long in referee documents, including relatively long When there are multiple entities, many samples will be generated, and the efficiency is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Judgment document information extraction method
  • Judgment document information extraction method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The specific embodiments of the present invention will be described in further detail below in conjunction with the accompanying drawings.

[0020] Such as Figure 1-2 As shown, the present invention provides an event-based method for extracting referee document information, the method comprising the following steps:

[0021] (1) Obtain the entire HTML of the referee document and parse the HTML of the referee document through the Python module BeautifulSoup, and extract the unformatted text from the HTML;

[0022] (2) Label the extracted unformatted text. In the labeling task of each event, a label is defined as an event type or an entity type. If a label has a relationship with other labels, the label is defined as an event type. , while other tags are defined as entity types, and the event structure in the referee document is defined as: event type-entity type-...-entity type, and the event type and its entity type corresponding to each event are marked from the unfo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a judgment document information extraction method, which comprises the following steps: firstly, extracting a formattless text text of a whole HTML of a judgment document, marking, and defining an event structure in the judgment document as event type-entity type-...-entity type; segmenting the text without the format according to words to obtain an array x so as to obtaina complete sample (x, y); processing the sample (x, y) to obtain a sample (x1, y1) of an event type extraction model, and training by adopting a BERT model as the event type extraction model; processing the event in the label y1; obtaining a sample ([x1, x2], y2) of the entity type extraction model; taking the self-attention network as an entity type extraction model and carrying out training; andobtaining characters corresponding to each event type and the entity type thereof according to y1 and y2. The method is small in sample requirement and beneficial to improving the model effect.

Description

technical field [0001] The invention relates to the field of text information extraction, in particular to a method for extracting judgment document information. Background technique [0002] Judgment documents are legally binding written conclusions made by the judge on the substantive and procedural issues of the case based on the facts of the case and the legal provisions after the trial of the case is over. Judgment documents faithfully record the adjudication process of the case, so they contain a lot of valuable information. Although the judgment documents have a certain format, they are still organized in the form of large texts. For the main information fields such as the plaintiff, the defendant, the judgment court, and the judgment time in the judgment documents, they are included in the judgment documents in a natural way. middle. If it is necessary to further excavate and utilize these public judgment document information, it is necessary to carry out structure...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F40/30G06F16/35G06F16/38G06K9/62G06Q50/18
CPCG06F16/35G06F16/38G06Q50/18G06F18/24G06F18/214
Inventor 金霞杨红飞程东张庭正
Owner 杭州费尔斯通科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products