Topic model-based judgment document similarity analysis method

A technology of similarity analysis and topic model, applied in semantic analysis, text database clustering/classification, unstructured text data retrieval, etc., can solve problems such as inability to support distributed computing

Inactive Publication Date: 2017-10-24
NANJING UNIV
View PDF6 Cites 31 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In comparison, the variational EM method has a faster training speed than the Gibbs Sampling method, but the result obtained by the variational EM method is a local optimum, not necessar

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Topic model-based judgment document similarity analysis method
  • Topic model-based judgment document similarity analysis method
  • Topic model-based judgment document similarity analysis method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0055] In order to make the purpose, technical solution and advantages of the present invention clearer, the present invention will be described in detail below in conjunction with the accompanying drawings and specific examples.

[0056] The invention aims at analyzing the similarity of the judgment documents. The analysis results can be applied to scenarios such as classification of judgment documents based on similarity, recommendation of similar judgment documents, evaluation of judge workload based on similarity of judgment documents, and prediction of legal provisions of cases. This method uses the TF-IDF method and the LDA method, and at the same time performs special processing and measurement for the characteristics of the referee document. The specific steps are as follows:

[0057] (1) In the set of judgment documents, a certain attribute (such as the cause of action, case type, etc.) is used as a screening condition to extract a subset of target documents as the ta...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a topic model-based judgment document similarity analysis method. A semantic-based semi-automatic and universal similarity analysis method is proposed for judgment documents by adopting an LDA (Latent Dirichlet Allocation) topic model in machine learning. The method mainly comprises the steps of selecting corpora; establishing a similarity tag; performing text preprocessing; performing input selection; performing parameter setting; performing iterative training; generating a model; applying the model; and the like. Based on a general similarity analysis method, the characteristics of rich specialized vocabularies and complex semantics in contents of the judgment documents are fully considered, and the semi-structured characteristics of the judgment documents are utilized, so that the accuracy and applicability of judgment document similarity analysis are improved.

Description

technical field [0001] The invention is a text similarity classification method, aimed at the court's internal judgment documents, and belongs to the technical fields of machine learning and text mining. Background technique [0002] The China Judgment Documents Network started construction in 2013. As of May 14, 2017, it has accumulated more than 29 million documents and has gradually grown into the world's largest website for sharing judicial documents. Based on these data, a series of judicial big data research and analysis work has been carried out one after another. While achieving remarkable results, there are still many problems and challenges. Some of the problems focus on the inadequacy of court data mining and analysis capabilities and related research. [0003] Judgment documents, as an important part of the court's work, record the process and results of the people's court's trial. It is not only the carrier of the results of the court's litigation activities, ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27G06F17/30
CPCG06F16/35G06F40/289G06F40/30
Inventor 周业茂葛季栋王悦李传艺李忠金周筱羽骆斌
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products