Method and system for realizing full-text retrieval of electronic official document

An electronic official document and full-text technology, applied in the field of full-text retrieval, can solve problems such as inability to complete analysis, consume a lot of time, manpower, material and financial resources, and low efficiency, and achieve the effect of solving the problem of unrecognizable analysis and improving retrieval efficiency

Pending Publication Date: 2022-01-14
SHANDONG EVAYINFO TECH CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In the actual work process, the content of electronic documents is often not limited to text-type PDFs. Some official documents scanned by printers or other scanning devices will be saved in PDF as pictures; Apache POI can only parse text-type PDFs. , when the picture type PDF appears in the electronic document, the parsing cannot be completed to realize the full-text search
[0006] In the existing scenario, more than 80% of the types of electronic documents have pictures. According to the existing methods, only the key information such as the title and time of the documents can be retrieved and the full-text search of 20% of the documents can be realized; It can be realized by manual input, which is bound to consume a lot of time, manpower, material and financial resources, and is inefficient

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for realizing full-text retrieval of electronic official document

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0049] According to the implementation method of the present invention, an embodiment of a method for realizing full-text retrieval of electronic documents is disclosed, which specifically includes the following process:

[0050] (1) Obtain basic information on electronic documents;

[0051] In this embodiment, the basic information of the official document is obtained by establishing a connection with the storage server for the attachment of the official document. The basic information of the official document includes information such as the type of the official document, the time when the document was received, the time when it was signed, and the transfer status.

[0052] Specifically, use the JAVA file reading method FILE to connect to the document attachment storage server to read the electronic document information, and configure a large file processing strategy, that is, when the document is too large, adopt the method of file segmentation and segmented reading and iden...

Embodiment approach

[0071] According to the implementation method of the present invention, an embodiment of a system for realizing full-text retrieval of electronic documents is disclosed, specifically including:

[0072] The information acquisition module is used to obtain the basic information of electronic documents;

[0073] The electronic document classification module is used to classify the electronic official documents according to whether there are pictures in the electronic official documents;

[0074] The electronic official document preliminary recognition module is used for electronic official documents such as pictures, and constructs an ocr object to scan and identify official documents;

[0075] The picture text recognition module is used to carry out gray scale and enlargement processing to the picture after scanning; The picture after processing is input in the picture text recognition model through training, obtains the text recognition result of described picture;

[0076] T...

Embodiment 3

[0079] According to the implementation method of the present invention, a terminal device is disclosed, including a server, the server includes a memory, a processor, and a computer program stored on the memory and operable on the processor, when the processor executes the program The method for realizing the full-text retrieval of electronic documents in the first embodiment is realized. For the sake of brevity, details are not repeated here.

[0080] It should be understood that in this embodiment, the processor can be a central processing unit CPU, and the processor can also be other general-purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic devices , discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.

[0081] The...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and a system for realizing full-text retrieval of an electronic official document. The method comprises the following steps: acquiring basic information of the electronic official document; determining whether the electronic official document is a picture type electronic official document or not according to whether a picture exists in the electronic official document or not; constructing an OCR (Optical Character Recognition) object for the electronic official document of a picture type to scan and recognize the official document; carrying out gray scale and amplification processing on the scanned picture; inputting the processed picture into a trained picture text recognition model to obtain a text recognition result of the picture; and realizing full-text retrieval of the electronic official document based on the text recognition result. By means of the OCR technology and in combination with the picture text recognition model, text recognition of the picture type electronic official document can be achieved, and the technical problem that the picture type electronic official document cannot be recognized and analyzed is effectively solved; and in addition, a non-picture type electronic official document recognition technology at present is added, so that full-text retrieval of different electronic official document types is realized, and the retrieval efficiency is comprehensively improved.

Description

technical field [0001] The invention relates to the technical field of full-text retrieval, in particular to a method and system for realizing full-text retrieval of electronic documents. Background technique [0002] Currently, the full-text retrieval of documents is often implemented through text analysis, and then the content is indexed to achieve full-text retrieval of documents. Most of the methods for document parsing in enterprise applications are based on Apache POI; the storage address of the document is obtained through the database connection, the content of the document is identified through the POI, and the document is tagged and indexed. [0003] At present, only the text type of the document can be identified through POI, and full-text retrieval cannot be realized when there is an image type. [0004] Electronic official documents refer to the electronic data of official documents in a standardized format formed by various regions and departments through the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06V30/148G06V10/24G06F16/583G06F16/33
CPCG06F16/5846G06F16/33
Inventor 王雷鑫吴士伟邵青峰闫学君李龙吴兴龙王姝婷
Owner SHANDONG EVAYINFO TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products