Unlock instant, AI-driven research and patent intelligence for your innovation.

Formatted extraction method for printing content

A technology for printing content and extraction methods, applied in character and pattern recognition, instruments, computer parts, etc., can solve the problems of floating positioning of extracted information, difficulty in extracting complex content, and determining the number of lines, so as to simplify design difficulty and improve calculation Efficiency, the effect of improving efficiency

Active Publication Date: 2019-09-06
石家庄捷弘科技有限公司
View PDF26 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to provide a method for extracting formatted print content to solve the problem of difficult extraction of complex content proposed in the above-mentioned background technology; it mainly includes: the number of lines of the extracted form is uncertain, and the number of lines cannot be accurately determined before extraction Problems; the problem of the impact of the different size of the form row on the division extraction; the problem of the form data page display and extraction; the problem of removing the interference information of the extracted content;

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Formatted extraction method for printing content

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The technical solution of this patent will be further described in detail below in conjunction with specific embodiments.

[0040] see figure 1 , in an embodiment of the present invention, a method for extracting formatted print content, comprising the following steps:

[0041] S1. Convert the printed content of the printed document into printed elements (including the text content and the x, y coordinates of the upper left corner of the relative page, as well as the height and width information displayed by the text content), and generate a set of printed elements (including the name of the printed document, The total number of printed pages, the index number of each page, the height and width of each page, the printing elements contained in each page, and the independent page picture of each page);

[0042] S2. Design extraction elements based on the sampled print element set (mainly including extraction element types, keywords, extraction range (extract x, y coordin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of document printing, in particular to a formatted extraction method for printing content, which comprises the following steps of: S1, intercepting the printing content of a printing document and converting the printing content into printing elements to generate a printing element set; S2, designing extraction elements according to the sampled printingelement set, and generating an extraction template; and S3, inputting the printing element set and the extraction template, performing operation by utilizing an extraction engine, and generating a formatted extraction result. The printing content formatting extraction method effectively solves the defect of pure text content extraction, and can flexibly, efficiently and accurately extract the content in a complex form. And the OCR form is effectively supplemented and optimized. Accurate coordinate extraction is innovatively improved, and container extraction elements are embedded into basic extraction element combinations, so that a complex extraction form can be effectively coped with. Due to the visual template design interface, the design difficulty is greatly reduced, and the design efficiency is improved.

Description

technical field [0001] The invention relates to the technical field of document printing, in particular to a method for formatting and extracting printed content. Background technique [0002] At present, printout is an indispensable way of outputting content in all walks of life, but the content of printout is only suitable for human eyes to watch and read, and cannot effectively format the output content again, which is not conducive to the secondary processing of data. In the current era of popular big data, we urgently need a way to reformat the printout content of other systems, so that the public valid data can be reformatted in a low-cost and high-efficiency way without data interface authorization. Take advantage of it. Provide basic data acquisition solutions for big data computing, artificial intelligence and other applications. [0003] There are three main ways to extract content. The first one is to obtain the printed content of plain text, and perform text s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00
CPCG06V30/412
Inventor 夏莫戛张文静甘玉涛樊利红
Owner 石家庄捷弘科技有限公司