Rule and model combination-based legal instrument information extraction method and system

A technology of information extraction and rules, applied in the direction of instruments, electrical digital data processing, data processing applications, etc., can solve the problems of impossible enumeration of all rules, failure to obtain a large number of rules, difficult maintenance of rules, etc., to improve the extraction effect and transplant Strong performance and avoid cold start problem

Active Publication Date: 2020-07-31
同方赛威讯信息技术有限公司
View PDF29 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to provide a legal document information extraction method and system based on the combination of rules and models, which is used to solve the problem that in the prior art, a large amount of training data cannot be obtained through single model training, and there are difficulties in rule maintenance and poor scalability when using rules to extract data. , it is impossible to enumerate all the rules

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Rule and model combination-based legal instrument information extraction method and system
  • Rule and model combination-based legal instrument information extraction method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0029] combined with figure 1 As shown, a legal document information extraction method based on the combination of rules and models,

[0030] First, collect legal industry terminology, business terminology, etc. to create a domain dictionary;

[0031] Secondly, sort out and extract entities according to business requirements, and then configure document entity extraction rules according to legal document writing rules;

[0032] Then, a rule-based method is used to extract legal document entities, and the accuracy and recall rate of extracted entities are used as indicators to evaluate the results, and the rules and dictionaries are modified and adjusted according to the evaluation results, and the extraction results are sent to the data as the initial labeling data The labeling module confirms and modifies, then trains the model, and releases the model; rule-based text paragraph classification processing, subject recognition processing, and rule-based element extraction proce...

Embodiment 2

[0037] combined with figure 2 , a legal document information extraction system based on the combination of rules and models, including:

[0038] The data acquisition module is used for the business data acquisition of the business application system and the legal document data acquisition. The collected data is used by the active learning text labeling module and the information extraction module. The data acquisition module collects data from three aspects. One is to use crawlers to obtain Internet public Data, the second is to obtain data from third parties, and the third is to obtain data from business systems.

[0039] The information extraction module mainly includes information extraction technology based on part-of-speech tagging rules and model-based information extraction technology, providing technical support for legal document extraction business. The processed result data is used by the active learning text labeling tool in the upper layer business application a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a rule and model combination-based legal document information extraction method, and the method comprises the following steps: creating a dictionary and a document entity extraction rule, and performing legal document entity extraction by adopting a rule-based extraction method; taking an extraction result as primary annotation data to perform data annotation operation, andtraining a model and publishing the model; performing legal document entity extraction by adopting an extraction method based on combination of rules and models, and selecting a result with a high evaluation result as an output result; evaluating the result, and if the requirement is met, ending; if not, continuing iteration. The invention further discloses a legal instrument information extraction system which comprises a data acquisition module, an information extraction module, a data labeling module, a data set management module and an evaluation module. According to the method, rules anda model method are effectively combined to form complementation, the legal document information extraction effect is improved, and meanwhile, the expansibility and portability of the system are higher.

Description

technical field [0001] The invention relates to the technical field of information extraction, specifically, a legal document information extraction method and system based on a combination of rules and models. Background technique [0002] Information extraction technology is the basic application technology of natural language processing. With the development of deep learning technology, this technology has developed rapidly in recent years and is widely used in vertical fields, including the political and legal industry. The application of AI technology enables machines to help handle offline activities and assist judicial personnel in handling cases. Most of the data required for handling a case comes from the relevant documents of the case, which requires the use of information extraction technology to convert the unstructured data of the documents into the structured data required for handling the case. At present, information extraction technology has achieved certai...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/295G06F40/247G06Q50/18
CPCG06Q50/18
Inventor 李丹魏明欣张兵蒋翱钟夫
Owner 同方赛威讯信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products