Method for identifying contract elements of PDF file

A technology for identifying methods and contracts, applied in the field of information, which can solve the problems of long time, low efficiency, and resource consumption.

Active Publication Date: 2020-12-11
TIANGU INFORMATION SCI TECH HANGZHOU
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, each item can be marked manually, which is inefficient, takes a long time, and consumes a lot of resources
[0003] Based on this, a variety of solutions for automatically identifying contract content are currently provided, but the above-mentioned solutions are basically applied to Word files, and there is currently a lack of a method for automatically identifying contract elements in PDF files

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for identifying contract elements of PDF file
  • Method for identifying contract elements of PDF file
  • Method for identifying contract elements of PDF file

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0068] The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present invention.

[0069] It should be noted that the embodiments of the present invention and the features of the embodiments may be combined with each other under the condition of no conflict.

[0070] The present invention will be further described below with reference to the accompanying drawings and specific embodiments, but it is not intended to limit the present invention.

[0071] The present invention includes a method for identifying contract elements ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for identifying contract elements of a PDF file. The method comprises the following steps: reading text blocks of a PDF file according to a preset reading mode, and storing key information of each text block; wherein the key information comprises a page number, text content and coordinates; obtaining character blocks in the same row according to the coordinates of the character blocks in the same page number, and performing statement division on the character blocks in the same row and the adjacent two rows of character blocks; identifying each statement according to the term characteristics and the title characteristics to obtain corresponding terms and titles, and forming contract content according to the identified statements; and matching the contract content with at least one contract template, and identifying the contract content according to the contract module obtained by matching so as to identify contract elements. The method has the beneficialeffects that scattered and complex PDF text blocks form natural statements; and the contract content is identified according to the contract module obtained by matching, so that the accuracy of identifying the contract elements is improved.

Description

technical field [0001] The invention relates to the field of information technology, in particular to a method for identifying contract elements of a PDF file. Background technique [0002] In many contracts, the format of the contract is chaotic and there is no hierarchical relationship. The content of the contract seems to be the text throughout, and there is no structured data display. Businesses need to disassemble contracts and identify different levels of titles, contract declarations, and contract terms. Currently, each item can be marked manually, which is inefficient, takes a long time, and consumes a lot of resources. [0003] Based on this, a variety of solutions for automatically identifying contract content are currently provided, but the above solutions are basically applied to Word files, and there is currently a lack of a method for automatically identifying contract elements in PDF files. SUMMARY OF THE INVENTION [0004] In view of the above problems ex...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/33G06F40/295G06F40/205
CPCG06F16/3344G06F40/205G06F40/295
Inventor 石伟坚金宏洲程亮
Owner TIANGU INFORMATION SCI TECH HANGZHOU
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products