Supercharge Your Innovation With Domain-Expert AI Agents!

Method for extracting crime process key information in legal document based on TextRank algorithm

A key information and legal technology, applied in computing, unstructured text data retrieval, text database browsing/visualization, etc., can solve the problems of many professional statements, neglect of text structure, position and semantic information, etc., and achieve the effect of maintaining coherence

Pending Publication Date: 2021-05-14
江苏网进科技股份有限公司
View PDF1 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, it only considers the similarity between sentence nodes in the text information, and directly compares the number of common words between sentences when constructing the edge relationship between nodes in the graph model, so as to judge the degree of correlation between the two sentences, while ignoring the The discourse structure of the text and the position and semantic information of sentences in the text
[0005] At the same time, legal documents are different from texts in other fields. The criminal process of the suspect is concentrated in the documents and there are many professional statements, so it cannot be extracted directly by using existing text information extraction methods.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for extracting crime process key information in legal document based on TextRank algorithm
  • Method for extracting crime process key information in legal document based on TextRank algorithm
  • Method for extracting crime process key information in legal document based on TextRank algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0037] see Figure 1 to Figure 3 , the present invention provides a method for extracting key information of the criminal process in legal documents based on the TextRank algorithm, which is characterized in that it specifically includes the following steps:

[0038] Step A: Preprocess the relevant texts of legal documents, and mark the set words or parts of speech, mainly including word segmentation, removing stop words and part-of-speech tagging, and obtain th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for extracting crime process key information in a legal document based on a TextRank algorithm, and the method specifically comprises the following steps: preprocessing related texts of the legal document, labeling set words or part-of-speech, and obtaining a TF*IDF value of each word in a subject word set; converting the subject term set w after text preprocessing into a vector representation form to obtain a new process text subject term set wc; adding word position information, combining semantic similar words, and obtaining sorting information of final keywords is obtained; separating a to-be-extracted legal document by taking a sentence as a unit, and step F, constructing a graph model of a TextRank algorithm, and carrying out iteration by utilizing obtained word vector representation and a set initial value until convergence; and ranking the scores of the vertexes of all sentences, taking the set highest K sentences as extracted crime process key information, ranking the K sentences in sequence, and removing redundant information in the K sentences. Therefore, the finally reserved sentences can be more coherent.

Description

technical field [0001] The invention relates to the technical field of information content extraction methods, in particular to a method for extracting key information of criminal processes in legal documents based on TextRank algorithm. Background technique [0002] In recent years, the continuous change of criminal methods has led to the characteristics of the criminal process of suspects in legal documents in various forms. Extracting key information about the criminal process of suspects from legal documents is a prerequisite for the final realization of downstream applications such as document matching and sentencing prediction. However, the existing text information extraction methods have the following deficiencies: [0003] Text information extraction using neural network algorithms requires a large document corpus, and there are problems such as long training time and slow extraction of key information in the criminal process, which is not suitable for practical ap...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/34G06Q50/18G06F40/216G06F40/30
CPCG06F16/345G06Q50/18G06F40/216G06F40/30
Inventor 李参宏
Owner 江苏网进科技股份有限公司
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More