Method and device for identifying explanatory text in portable document format file

A portable file format and file technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as wrong labeling, heavy editing workload, and easily missing labels in picture annotations, so as to improve accuracy , the effect of improving work efficiency

Inactive Publication Date: 2014-11-19
CHINA SOUTH PUBLISHING & MEDIA GROUP
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a method and device for identifying legends in PDF files, so as to solve the technical problems of heavy editing workload caused by automatic recognition of legends in existing PDF files and easy missing or wrong labeling of legends in pictures

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for identifying explanatory text in portable document format file
  • Method and device for identifying explanatory text in portable document format file
  • Method and device for identifying explanatory text in portable document format file

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048] refer to figure 1 , a preferred embodiment of the present invention provides a method for identifying legends in PDF files, the method comprising:

[0049] Step S101, parsing and identifying the text block object and picture block object of the current page of the PDF file;

[0050]Optionally, in this embodiment, parsing the PDF file includes parsing the protocol of the PDF file format, as in this embodiment, the PDF file format used is Adobe's PDF protocol version 1.5; secondly, parsing the content of the PDF file, extracting therefrom For data such as text paragraphs, pictures, tables, formulas, etc., open source technologies such as xpdf and podofo can be used to analyze the content of the PDF document. Preferably, the present embodiment adopts mupdf open source technology to analyze to identify the content of the current page of the PDF file. When parsing the content of a PDF document, the pictures and text paragraphs in the PDF document are surrounded by rectangul...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and device for identifying explanatory text in a portable document format file. The method comprises the steps of analyzing and identifying text block objects and image block objects in the current page of a PDF file, determining the most adjacent text block object of an image block object to be matched in the vertical direction, judging whether the most adjacent text block object includes identification characters used or identifying images or not, judging that the most adjacent text block object is the corresponding explanatory text block object if yes, matching and relating the identified explanatory text block object and the corresponding image block object. According to the method and device for identifying the explanatory text in the PDF file, the identified text block object and the image block object are matched, automatic relating of the image block objects and the text block object used as the explanatory text in the PDF file is achieved, operation for manually adding the explanatory text to images when documents are edited is avoided, namely the working efficiency is improved, and the accuracy of explanatory text adding is also improved.

Description

technical field [0001] The invention relates to the field of PDF text recognition in a portable file format, in particular to a method and a device for recognizing legends in PDF files. Background technique [0002] PDF is the abbreviation of Portable Document Format (Portable Document Format), which is an open electronic document format developed by Adobe. The advantage of the PDF file format is that the file format has nothing to do with the platform of the software, hardware, and operating system. It can be used without barriers in Windows, Unix, or Apple's Mac OS operating systems, and can achieve the same display effect. With its excellent characteristics, the PDF file format has become an ideal file format for electronic document distribution and formatted information dissemination on the Internet. Currently, most of the scientific papers and e-books published on the Internet are submitted in PDF format. However, the original intention of the PDF file format is to ac...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/24
Inventor 雷陆峰
Owner CHINA SOUTH PUBLISHING & MEDIA GROUP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products