A document processing method and device

A processing method and literature technology, which is applied in the direction of instruments, calculations, character and pattern recognition, etc., can solve the problems of low efficiency of literature data processing, achieve the effect of improving efficiency and reducing the amount of calculation

Active Publication Date: 2021-11-26
HANVON CORP
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The embodiment of the present application provides a document processing method and device, which can identify and match document data through feature templates to solve the problem of low document data processing efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A document processing method and device
  • A document processing method and device
  • A document processing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0073] This embodiment provides a document processing method, such as figure 1 As shown, the method includes: step 100 to step 120.

[0074] Step 100, acquiring a feature template for expressing style features of the target document.

[0075] Wherein, the above feature template includes at least: service features.

[0076] The document data processed in this application are local chronicles, ancient books and other documents with clear style characteristics, which are generated after scanning and identification. The document data usually records the text blocks corresponding to the text blocks and the format of each text block from front to back according to the sequence of the text blocks appearing in the document.

[0077] The style features mentioned in the embodiments of this application refer to the writing format of documents, including two parts: format features and business features. Among them, format features such as large characters in the top grid, reversed whit...

Embodiment 2

[0155] Correspondingly, this application also discloses a document processing device, such as Figure 7 As shown, the device includes:

[0156] A feature template acquisition module 710, configured to acquire a feature template for expressing style features of the target document, the feature template includes: business features;

[0157] The text identification module 720 is used to perform text identification on the text file describing the target document according to the above feature template, and determine the feature value of the business feature of the target document;

[0158] The document information output module 730 is configured to output the document information of the target document in a preset format according to the determined characteristic value of the business characteristic and the above characteristic template.

[0159] optional, such as Figure 8 As shown, before acquiring the feature template for expressing the style features of the target document, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present application provides a document processing method, which belongs to the field of document processing and solves the problem of low efficiency of document data processing in the prior art. The method includes: acquiring a feature template for expressing the style features of the target document; performing text recognition on a text file describing the target document according to the feature template, and determining the feature value of the business feature of the target document; according to the determined feature value of the business feature and feature templates to output the document information in the preset format of the target document. The document processing method disclosed in the embodiments of the present application extracts document data based on a feature template, and does not need to perform semantic recognition of a large amount of data, which effectively reduces the amount of computation and helps to improve the efficiency of document data extraction.

Description

technical field [0001] The present application relates to the field of document processing, in particular to a document processing method and device. Background technique [0002] Ancient books and documents are an important basis for studying the natural, social, political, economic, cultural and other aspects of a certain period and / or a certain region. For example, local chronicles are documents that comprehensively record the natural, social, political, economic, cultural and other aspects of a certain region in a certain period. In order to facilitate research and access to literature information, the structure of ancient literature is particularly important. In the process of structuring ancient literature, the usual practice is to first obtain the words in the fragmented literature through scanning and identification; then, through the semantic recognition of the words in the literature, the content of the fragmented literature is classified or organized. index. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/00G06K9/62
CPCG06V30/40G06V30/418G06V10/751
Inventor 孟晓静高宝庆王战波
Owner HANVON CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products