Information extraction method based on legal instruments

An information extraction and document technology, applied in the field of information extraction based on legal documents, can solve problems such as inaccurate classification of entity data, lack of consideration of new words and terms, and poor learning effect, so as to improve learning effect, avoid manual labeling, and enhance Effects on Semantic Comprehension

Active Publication Date: 2022-02-11
湖南工商大学
View PDF11 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] The present invention provides an information extraction method based on legal documents, the purpose of which is to solve the problem that traditional methods do not consider neologisms in legal document data, legal documents supervised and labeled data are often scarce in quantity, the learning effect is not good, and the extracted entity The problem of inaccurate data classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Information extraction method based on legal instruments
  • Information extraction method based on legal instruments
  • Information extraction method based on legal instruments

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0083] In order to make the technical problems, technical solutions and advantages to be solved by the present invention clearer, the following will describe in detail with reference to the drawings and specific embodiments.

[0084] The present invention aims at the problems that the existing method does not consider the neologisms in the legal document data, the supervised labeling data of the legal document is often scarce, the learning effect is not good, and the classification of the extracted entity data is inaccurate, and provides a method based on the legal document information extraction method.

[0085] Such as Figure 1 to Figure 6As shown, the embodiment of the present invention provides a method for extracting information based on legal documents, including: step 1, obtaining unsupervised data of legal documents, performing data preprocessing and data cleaning on unsupervised data of legal documents, and removing unsupervised data of legal documents Supervise the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an information extraction method based on legal instruments. The method comprises the following steps: 1, acquiring unsupervised data of the legal instruments, performing data preprocessing and data cleaning on the unsupervised data of the legal instruments, removing noise in the unsupervised data of the legal instruments, and forming a corpus based on the legal instruments; and 2, carrying out new word discovery on the corpus based on the legal instruments by adopting a new word discovery algorithm to obtain a preliminary legal instrument field dictionary candidate set, and carrying out screening processing of denoising, removing general words and combining high-frequency words on the preliminary legal instrument field dictionary candidate set to obtain a final legal instrument field dictionary. The semantic comprehension capability is high, manual annotation is reduced, the learning capability of general vocabularies is enhanced, the precision is high, reference is provided for information extraction of other downstream tasks of legal instruments or other fields, the model can obtain better generalization performance, and the entity classification effect of the model is improved.

Description

technical field [0001] The invention relates to the technical field of information extraction, in particular to an information extraction method based on legal documents. Background technique [0002] Existing model methods for named entity extraction specifically for legal document data are still scarce, and labeled high-quality legal document data is very scarce. On the other hand, open legal document data without manual labeling is generally huge in quantity and easy to obtain, and a large amount of new data will be generated with the development of time, but the obtained data often belongs to the original plain text data. Supervised learning models will be powerless in the face of these data. Secondly, there are often many proper nouns and technical terms in legal documents, that is, they have strong domain characteristics, and it is generally difficult for general models to consider the domain nature of data. [0003] Information extraction of legal documents is an em...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/242G06F40/216G06F40/295G06K9/62G06F16/35
CPCG06F40/242G06F40/295G06F40/216G06F16/35G06F18/214
Inventor 毛星亮施鹤远李琳曹文治宁肯
Owner 湖南工商大学
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products