Rule-based adaptive text information extraction method and software memory

A technology of text information and extraction methods, applied in the field of information processing, to achieve the effect of convenient work and comprehensive coverage

Pending Publication Date: 2019-07-09
WUHAN INSTITUTE OF TECHNOLOGY +1
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

These methods are not suitable for dealing with text objects such as court trial transcripts, which are not fixed in form, need to extract more information such as paragraphs and sentences, and have a long text length.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Rule-based adaptive text information extraction method and software memory
  • Rule-based adaptive text information extraction method and software memory
  • Rule-based adaptive text information extraction method and software memory

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0042] The rule-based adaptive text information extraction method of the embodiment of the present invention, the method comprises the following steps:

[0043] Statistically compare, analyze and summarize text objects in professional fields, and construct rules for text information extraction;

[0044] According to the rules, it is hierarchically processed in a tree order to form an adaptive text template. The template is divided into multiple categories according to different professional fields. Different categories of templates correspond to different categories of text objects. The templates a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a rule-based adaptive text information extraction method and a software memory, and the method comprises the following steps: constructing a text information extraction rule for a text object in the professional field, and summarizing the rule in a template; wherein the template rule is graded according to a tree sequence to form a text template, and each template is of a four-layer structure and comprises segments, lines, sentences and words; conducting statistical analysis on the text objects to be extracted, presetting representative keywords wherein the keywords arecomposed of related words and irrelevant words; carrying out information extraction on the to-be-extracted text by using the constructed template, and carrying out text matching through keywords according to a template four-layer structure sequence; for each hierarchy in the template, when a plurality of matching results appear, performing filtering by using keywords, and accurately positioningtarget information; and outputting a text extraction result containing the keyword. The method can automatically adapt to changes of text contents and structures, and target text information can be efficiently and accurately extracted.

Description

technical field [0001] The invention relates to the technical field of information processing, in particular to a rule-based adaptive text information extraction method and a software memory. Background technique [0002] At present, there is a large amount of valuable textual information in texts in various professional fields, such as court hearing transcripts, ruling transcripts, and mediation transcripts that record court hearings in detail. However, to manually sort out and extract the content of concern in legal documents, especially when dealing with a large number of documents, it consumes a lot of manpower and material resources, and the efficiency is low. [0003] The current text extraction technology is mainly aimed at fixed structure text, text keyword extraction, topic discovery or short text adaptive information extraction, etc. These methods are not suitable for dealing with text objects such as court trial transcripts, which are not fixed in form, need to e...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/24G06F16/33
CPCG06F40/186
Inventor 李晓林李道庆张彦铎田英明刘玮姚峰范佳莹
Owner WUHAN INSTITUTE OF TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products