Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A classification method to deal with category imbalance

A classification method and crime technology, which are applied in the field of crime classification with unbalanced categories, and can solve the problems of affecting the results of multi-classification, the decline of the classification accuracy of the binary classifier, and the number of positive samples.

Inactive Publication Date: 2019-03-15
THE 28TH RES INST OF CHINA ELECTRONICS TECH GROUP CORP
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the classification effect of this fully supervised classification method is highly dependent on the quality of the human-labeled corpus, and the binary classifier often faces the situation that the number of negative samples is large and the number of positive samples is small
The imbalance of categories will cause the classification accuracy of the two classifiers to drop greatly, which will eventually affect the results of multi-classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A classification method to deal with category imbalance
  • A classification method to deal with category imbalance

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] Such as figure 1 As shown, a crime classification method to deal with category imbalance includes the following steps:

[0024] (1) Obtain the collected corpus, preprocess the case description corpus, and obtain the case description corpus related to the crime;

[0025] (2) The case description corpus related to a certain crime is used as a positive example corpus, and the case description corpus unrelated to a certain crime is used as a negative example corpus, and the training corpus and test corpus are divided;

[0026] (3) if figure 2 As shown, the undersampling algorithm is used to randomly extract several subsets independently from the negative instance corpus, and each subset and the positive instance samples are combined into a training corpus subset; using the above-mentioned several training corpus subsets, multiple LSTM-based base classifier;

[0027] (4) Combining the classification results of multiple base classifiers to classify the crimes of the new c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an accusation classification method for dealing with unbalanced classification, which comprises the following steps: acquiring and collecting corpus, preprocessing the case description corpus to obtain the case description corpus related to the accusation; The case description corpus related to a crime is taken as the positive example corpus, and the case description corpusirrelevant to a crime is taken as the negative example corpus, and the training corpus and the test corpus are divided. Under-sampling algorithm is used to extract a number of subsets from the negative instance corpus independently and randomly, and each subset and the positive instance sample are combined into a training corpus subset. Several LSTM-based base classifiers are trained by using several subsets of training corpus. Combined with the classification results of the base classifier, the new crime description is classified. The invention can train a base classifier with high classification accuracy rate under the condition that the number of positive samples is small and the number of negative samples is large, solves the situation that the classification error rate is high underthe condition of category imbalance, and realizes the automatic inference of charges of case description.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a crime classification method for dealing with category imbalance. Background technique [0002] With the development of artificial intelligence technology, more and more traditional industries have introduced artificial intelligence technology to reduce manpower, save costs and improve efficiency. In the daily work of the court, it is necessary to infer the charge of a piece of case description for the convenience of recording and retrieval. The traditional method is mostly manual entry, which requires personnel with certain legal knowledge and a certain understanding of each crime. After reading the case description, manually enter the crimes related to the case into the system. There are many types of crimes. After working for a long time, the staff cannot complete the entry of crimes efficiently. Therefore, there is an urgent need for an automatic charge...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35G06Q50/26
CPCG06Q50/26
Inventor 杨权梁栋后弘毅
Owner THE 28TH RES INST OF CHINA ELECTRONICS TECH GROUP CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products