Hierarchical multi-label categorization method suitable for legal identification

A multi-label and category label technology, which is applied in character and pattern recognition, special data processing applications, instruments, etc., can solve the problems of not utilizing the hierarchical structure characteristics of label space, inaccessibility, and affecting model intelligibility, etc.

Active Publication Date: 2018-01-12
NANJING UNIV
View PDF5 Cites 69 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The disadvantage of the local algorithm is that on the one hand, multiple classifiers need to be trained, which makes the model more complex and affects the intelligibility of the model; on the other hand, there will be blocking problems in the prediction process, that is, samples that are misclassified in the upper layer cannot reach the lower layer Although some people have proposed three strategies of lowering the threshold, limiting voting and expanding the threshold multiplication to deal with the blocking problem of the local algorithm, the local algorithm is often not ideal in terms of prediction accuracy.
The general hierarchical multi-label classification algorithm often cannot guarantee that its prediction results meet the hierarchical constraints, or cannot achieve the optimal learning effect because it does not utilize the hierarchical structure features of the label space.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hierarchical multi-label categorization method suitable for legal identification
  • Hierarchical multi-label categorization method suitable for legal identification
  • Hierarchical multi-label categorization method suitable for legal identification

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0132] Such as figure 1 Shown, the steps of the present invention are:

[0133] Step 1: Use jsoup-based crawler technology to crawl the required original text data set of referee documents from the Internet, and randomly divide it into a training set and a test set at a ratio of 7:3. Then pre-process the referee documents, mainly completing the following tasks:

[0134] According to the text structure of the judgment document, the case facts and the applicable legal provisions are extracted. The former is used to generate the feature vector of the case sample, and the latter is used to represent the category label of the case sample. The original text data set is converted into a semi-structured multi- label training and test sets;

[0135] Correct errors and format inconsistencies in applicable legal provisions of the case;

[0136] Use Harbin Institute of Technology's language technology platform LTP to perform word segmentation and part-of-speech tagging for the descript...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a hierarchical multi-label categorization method suitable for legal recognition. The method comprises the following steps: step 1, extracting facts of a case and legal provisions thereof from a pre-processed judgment document; step 2, based on a hierarchical structure of a label space, expanding the legal provisions corresponding to the facts of the case, so that the categorization labels of the sample of the case are a subset of the label space; step 3, performing word segmentation and part-of-speech tagging on the texts of the facts of the case, selecting features ofword segmentation results, selecting features that fully represent the facts of the case, establishing a feature vector; step 4, establishing a prediction model: finding out the set N(x) of k neighborsamples in the expanded multi-label training set of a new instance x, setting a weight for each neighbor sample, calculating confidence of the new instance to each category according to categorization weight of k neighbor samples to each category, finally, predicting the category label set of the new instance.

Description

technical field [0001] The invention belongs to the field of computer data analysis and mining, and relates to a hierarchical multi-label classification method suitable for legal identification. Background technique [0002] Hierarchical multi-label classification is a special case of multi-label classification. Different from general multi-label classification, in hierarchical multi-label classification problems, each sample can have multiple class labels, and the sample label space is organized in a tree or directed acyclic graph hierarchy. In a directed acyclic graph, a node may have multiple parent nodes, which is more complex than the tree structure, and the design of the algorithm is more difficult. Therefore, the current research on hierarchical multi-label classification mainly focuses on the tree-shaped category label structure. . Hierarchical multi-label classification algorithms can be divided into local algorithms and global algorithms according to the differen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27G06K9/62
Inventor 柏文阳陈朋薇张剡周嵩
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products