A method and device for extracting text paragraphs

A text and paragraph technology, applied in the field of text paragraph extraction, can solve problems such as reducing efficiency and increasing workload, and achieve the effect of improving efficiency and reducing workload.

Active Publication Date: 2021-05-28
ZHONGKE DINGFU BEIJING TECH DEV
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] This application provides a method and device for extracting text paragraphs to solve the problem of relying on manual reading of texts in the prior art, re-distinguishing each paragraph of the text according to the specific content of the text, and then summarizing the title of each paragraph, which not only leads to an increase in workload , and the problem of reducing efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and device for extracting text paragraphs
  • A method and device for extracting text paragraphs
  • A method and device for extracting text paragraphs

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] see figure 1 , in the first aspect, the application provides a method for extracting text paragraphs including:

[0033] Step 11: Get the text.

[0034] Taking legal judgment documents as an example, the meaning of the text information represented by the text content of multiple natural paragraphs may be the same. Extracting natural paragraphs with the same meaning of text information as a paragraph can greatly improve the accuracy of the text structure.

[0035] Step 12: Create a model tree corresponding to the text, the model tree includes at least several nodes and an extraction expression set corresponding to each node, and the extraction expression set includes at least one extraction expression.

[0036] Model trees can be created according to sampling requirements. For the sake of clarity, the parent node, child node, descendant node and parent node in the model are firstly introduced below. like figure 2 In the model tree shown, take the node "judgment" as...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present application discloses that the present application provides a method and device for extracting text paragraphs. The method includes obtaining text; creating a model tree corresponding to the text; Perform extraction to generate a positioning node set; perform an extraction expression set corresponding to each of the pre-positioning nodes and an extraction expression set corresponding to each of the post-positioning nodes with the text according to preset matching rules Matching to obtain start information and end information, wherein the matching rules include the farthest match rule and / or the nearest match rule; according to the start information and end information, determine the paragraph of the text; extract the paragraph of the paragraph text message. This application can automatically extract the corresponding paragraphs in the text according to the needs of the user, without the need for the user to read the text content one by one, improving the efficiency of text structuring and reducing the workload.

Description

technical field [0001] The present application relates to the technical field of text information extraction, in particular to a method and device for extracting text paragraphs. Background technique [0002] A text is the expression of written language, and from a literary point of view, a text is usually a combination of one or more sentences with complete and systematic meaning. Text structure is a kind of natural text semantic information, which can assist readers to understand the level of text. Text writers usually use a combination of visual and semantic means to design text structures. Visual means such as font style, page layout, etc. Semantic means include using multi-level headings, distinguishing between headings and text, and ordering paragraphs. [0003] From the semantic means of text content, the text structure generally includes: text title, paragraph title, paragraph subtitle and body paragraph. Obtaining the text structure is very helpful for many text ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/279G06F40/205G06F40/30
CPCG06F40/205G06F40/279G06F40/30
Inventor 李德彦晋耀红席丽娜
Owner ZHONGKE DINGFU BEIJING TECH DEV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products