Document processing method, device and system, electronic equipment and storage medium

A document processing and document technology, applied in the direction of electronic digital data processing, special data processing applications, instruments, etc., can solve the problems of paper management and storage, information cannot be effectively retrieved, etc., to achieve the effect of improving accuracy

Pending Publication Date: 2020-11-27
BEIJING BAIDU NETCOM SCI & TECH CO LTD
View PDF5 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] Paper documents are the carrier of information dissemination, but a large amount of ...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document processing method, device and system, electronic equipment and storage medium
  • Document processing method, device and system, electronic equipment and storage medium
  • Document processing method, device and system, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0036] The embodiment of this application provides a document processing method, such as figure 1 shown, including:

[0037] S101: Acquiring an image of the first historical document;

[0038] Exemplarily, the first historical document may be one of multiple historical documents currently required to be stored, and any one of them is referred to as the first historical document. For each historical document, the solution provided by the present application may be used for follow-up Processing, this embodiment will not repeat them one by one.

[0039] In addition, the first historical document may be a book, and correspondingly, the image of the first historical document may be composed of one or more images. It can be understood that if a book is to be archived electronically, all the pages in the book can be scanned to obtain the corresponding image of each page as the image of the first historical document. Since the same follow-up processing is adopted no matter whether ...

Embodiment 2

[0048] On the basis of the foregoing embodiment one, as figure 2 As shown, after the image of the first historical document is acquired, it may further include: S100: Perform preprocessing on the image of the first historical document to obtain a preprocessed image of the first historical document.

[0049] In this embodiment, the preprocessing of the image of the first historical document may include noise removal, image binarization, tilt correction, and the like. Here, when the image of the first historical document is scanned, due to the influence of factors such as the paper quality of the first historical document itself, the degree of illumination during scanning, etc., the scanned image is generally mixed with noise and defects. In addition, factors such as uneven edges of the paper, uneven placement of the paper, or poor deviation correction performance of the scanner will cause the scanned image to be skewed. These will reduce the accuracy of the next document imag...

Embodiment 3

[0061] Such as image 3 As shown, in Example 1 figure 1 on the basis of figure 1 S102 in can specifically include:

[0062] S1021: Perform region division on the image of the first historical document to obtain at least one type of region among a table region, a text region, and a picture region.

[0063] The aforementioned embodiments have mentioned that the first historical document can correspond to one or more pictures, and each image can be divided into regions to obtain at least one type of table region, text image, and picture region corresponding to each image . Since the feature extraction methods of different areas such as text area, picture area, and table area are different, it is necessary to divide different areas. Specifically, the way to divide the area can be as follows:

[0064] The detection of the image area and the table area is applied to the first model.

[0065] Specifically, the first model may be an M2Det model. The M2Det model is based on MLFP...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a document processing method, device and system, electronic equipment and a storage medium, and relates to the fields of information management, image processing, text processing and the like. The specific implementation scheme is as follows: acquiring an image of a first historical document; performing region division on the image of the first historical document to obtainat least one type of region; respectively carrying out corresponding feature extraction on the at least one type of regions to obtain sub-feature information respectively corresponding to the at least one type of regions; and storing the sub-feature information corresponding to the at least one type of regions as features of the first historical document.

Description

technical field [0001] This application relates to the field of computer technology. This application particularly relates to the fields of information management, image processing, and text processing. Background technique [0002] Paper documents are the carrier of information dissemination, but a large amount of accumulated paper is difficult to manage and save, and information cannot be retrieved effectively. With the development of digital acquisition technology, image processing technology and storage technology, more and more information is stored in the form of document images. As the scale of document images becomes larger and larger, how to effectively store document image information so as to efficiently perform document retrieval becomes a problem that needs to be solved. Contents of the invention [0003] The disclosure provides a document processing method, device, system, electronic equipment and storage medium. [0004] According to a first aspect of the...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/583G06K9/00G06K9/46
CPCG06F16/583G06V30/413G06V10/40
Inventor 冯博豪庞敏辉谢国斌
Owner BEIJING BAIDU NETCOM SCI & TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products