Keyword extraction method and system for judicial text data

A technology of text data and extraction methods, applied in the direction of unstructured text data retrieval, network data retrieval, data processing applications, etc.

Active Publication Date: 2019-07-26
ENJOYOR COMPANY LIMITED
View PDF8 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, there are certain limitations in the above-mentioned keyword extraction methods, especially the keyword method for judicial text data.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Keyword extraction method and system for judicial text data
  • Keyword extraction method and system for judicial text data
  • Keyword extraction method and system for judicial text data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0048] Embodiment: A keyword extraction system for judicial text data, including: a data collection module, a data processing module, a word segmentation processing module, and a weight calculation module. Keyword extraction module; the data collection module is used to collect judicial text data and Keyword search vocabulary; the data processing module performs structured processing and de-duplication preprocessing operations on the collected judicial text data and keyword search vocabulary; the word segmentation processing module is used to segment the judicial text data and remove stop words, It also counts word frequency and word position; the weight calculation module is used to calculate various weight values ​​of words; the keyword extraction module is used to fuse various weight values ​​of words to obtain the final weight value and extract keywords.

[0049] The verification data in this embodiment is the Shanghai People’s Mediation Agreement and the nationwide civil judg...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a keyword extraction method and system for judicial text data, and the method comprises the steps: firstly, introducing a legal name and judicial related professional vocabulary for word segmentation, and carrying out the manual reinspection to construct a judicial professional vocabulary annotation table; secondly, by constructing a judicial professional vocabulary labeling dictionary and a large-scale user dictionary, carrying out word segmentation, removing stop words and the like, and obtaining words; collecting and counting keyword search vocabularies of various disputes and causes to form candidate keywords; then, adding a title word weight and word global weight value method to correct the weight of the candidate keyword TF _ IDF; and if the to-be-extracteddocument does not contain the candidate keywords, inputting the TF _ IDF normalization value of each word in the document as an initial weight of a TextRank algorithm to obtain a final word weight. According to the method, judicial text data can be well matched, matching performance is high, and the method is suitable for most judicial text data. Moreover, extraction speed is increased, and extraction accuracy is high.

Description

Technical field [0001] The invention relates to the technical field of natural language processing, in particular to a method and system for extracting keywords from judicial text data. Background technique [0002] With the rapid development of computer technology and Internet technology, various industries have gradually entered the ranks of informatization. Therefore, the judicial field has also embarked on the road of informatization. According to statistics, more than 50 million documents can be searched online in China's judgment documents, and the scale is increasing by about 30,000 every day. In addition, there are cases of various conflicts and disputes in the people's mediation system of major judicial offices. However, in the face of the ever-increasing mass of judicial text data, users need to spend a lot of time reading case information and keyword information acquisition. For example, the “634-page court verdict of the first instance” reported in the December 2018 n...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06F16/36G06F16/951G06Q50/18
CPCG06F16/367G06F16/951G06Q50/18G06F40/242G06F40/216G06F40/289Y02D10/00
Inventor 张云云王开红丁锴陈涛蒋立靓胡慷沈晓宇陈寅峰
Owner ENJOYOR COMPANY LIMITED
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products