An ensemble learning method and system for legal text information mining

An integrated learning and text information technology, applied in the integrated learning method and system field of legal text information mining, can solve problems such as difficulty in applicability and accuracy impact, and achieve improved prediction accuracy, high accuracy, and strong Effect of Linear Dividing Ability

Inactive Publication Date: 2019-02-01
JINAN INSPUR HIGH TECH TECH DEV CO LTD
View PDF5 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, it is often difficult to have wide applicability ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An ensemble learning method and system for legal text information mining
  • An ensemble learning method and system for legal text information mining

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] combined with figure 1 , this embodiment proposes an integrated learning method for legal text information mining. First, collect legal texts processed by professional legal staff as a data source, and preprocess the data source. Secondly, train the preprocessing results to obtain Different feature engineering models, the linear SVM classifier learns the text vectors obtained by different feature engineering models, and then the linear SVM classifier predicts the preprocessed data source according to the learning results, integrates the prediction results through the Stacking method, and the prediction results It is used for the training of the integrated learning model, and the trained integrated learning model outputs more comprehensive and accurate prediction results for legal texts to be processed.

[0041] The operations involved in preprocessing the data source include: using jieba or thulac tools to build a thesaurus, and performing word segmentation and removing...

Embodiment 2

[0046] combined with figure 2 , the present embodiment proposes an integrated learning system for legal text information mining, its structure includes:

[0047] Collection module 1, used to collect legal texts processed by professional legal staff as a data source;

[0048] Preprocessing module 2, used to preprocess the legal text in the data source;

[0049] Feature extraction module 3, used to extract the different features of all legal texts in the data source;

[0050] Training building block 4, training and constructing different feature engineering models according to different extracted features;

[0051] The linear SVM classifier module 5 is used to learn the text vectors obtained by different feature engineering models, and predict the preprocessed data source according to the learning results;

[0052] Integration module 6, for integrating the prediction result of linear SVM classifier module by Stacking method;

[0053] Learning and training module 7, used to ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an ensemble learning method for legal text information mining, involving the fields of information mining and ensemble learning, By extracting different features from the preprocessed legal texts and building corresponding feature engineering models, using linear SVM classifier to learn the text vectors from different feature engineering models, The learning linear SVM classifier is used to predict the pre-processed legal texts, and the Stacking method is used to integrate the predicted results. At the same time, the ensemble learning model is trained and constructed tooutput more comprehensive and more accurate predicted results for the legal texts to be processed. This method can better synthesize the existing information, discover the relevance of the context inthe information, so as to form a stronger non-linear division ability, reduce the generalization error, and have a higher accuracy in the prediction of charges, laws, sentences and other contents than the prediction of a single model. In addition, the invention also discloses an integrated learning system for legal text information mining.

Description

technical field [0001] The invention relates to the technical field of information mining and integrated learning, in particular to an integrated learning method and system for legal text information mining. Background technique [0002] In the field of machine learning, ensemble learning itself is not a separate machine learning algorithm. It completes learning tasks by building multiple learners and combining them to form a strong learner. What should be paid attention to in the process is the selection and form of the weak classifier model and the way of combining weak classifiers into a strong classifier. [0003] Integrated learning has well-known homologous integration methods such as Adaboost and Bagging, that is, integrated learning is performed by averaging multiple homogeneous and homogeneous models, taking majority votes, or taking different weights for multiple trainings. In addition, there is Stacking's heterogeneous integrated learning. It divides the trainin...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06F16/332G06F16/335
CPCG06F18/2411G06F18/214
Inventor 段强李锐于治楼
Owner JINAN INSPUR HIGH TECH TECH DEV CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products