Literature processing method and device

A processing method and document technology, applied in the direction of instruments, character and pattern recognition, computer components, etc., can solve the problem of low efficiency of document data processing

Active Publication Date: 2019-04-16
HANVON CORP
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The embodiment of the present application provides a document processing method and device, which can identify and match document data through feature templates to solve the problem of low document data processing efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Literature processing method and device
  • Literature processing method and device
  • Literature processing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0073] This embodiment provides a document processing method, such as figure 1 As shown, the method includes: step 100 to step 120.

[0074] Step 100, acquiring a feature template for expressing style features of the target document.

[0075] Wherein, the above feature template includes at least: service features.

[0076] The document data processed in this application are local chronicles, ancient books and other documents with clear style characteristics, which are generated after scanning and identification. The document data generally records the text blocks corresponding to the text blocks and the format of each text block from front to back according to the order in which the text blocks appear in the document.

[0077] The style features mentioned in the embodiments of this application refer to the writing format of documents, including two parts: format features and business features. Among them, format features such as large characters in the top grid, reversed wh...

Embodiment 2

[0155] Correspondingly, this application also discloses a document processing device, such as Figure 7 As shown, the device includes:

[0156] A feature template acquisition module 710, configured to acquire a feature template for expressing style features of the target document, the feature template includes: business features;

[0157] The text identification module 720 is used to perform text identification on the text file describing the target document according to the above feature template, and determine the feature value of the business feature of the target document;

[0158] The document information output module 730 is configured to output the document information of the target document in a preset format according to the determined characteristic value of the business characteristic and the above characteristic template.

[0159] optional, such as Figure 8 As shown, before acquiring the feature template for expressing the style features of the target document, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a literature processing method, belongs to the field of literature processing, and solves the problem of low literature data processing efficiency in the prior art. The method comprises the following steps: obtaining a feature template for expressing a posture feature of a target literature; performing text recognition on the text file describing the target literature according to the feature template, and determining a feature value of a business feature of the target literature; and outputting preset format literature information of the target literature according to the determined characteristic values of the service characteristics and the characteristic template. According to the literature processing method disclosed by the embodiment of the invention, literature data extraction is carried out based on the feature template, semantic recognition of a large amount of data is not needed, the operand is effectively reduced, and the literature data extraction efficiency is improved.

Description

technical field [0001] The present application relates to the field of document processing, in particular to a document processing method and device. Background technique [0002] Ancient books and documents are an important basis for studying the natural, social, political, economic, cultural and other aspects of a certain period and / or a certain region. For example, local chronicles are documents that comprehensively record the natural, social, political, economic, cultural and other aspects of a certain region in a certain period. In order to facilitate research and access to literature information, the structure of ancient literature is particularly important. In the process of structuring ancient literature, the usual practice is to first obtain the words in the fragmented literature through scanning and identification; then, through the semantic recognition of the words in the literature, the content of the fragmented literature is classified or organized. index. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/00G06K9/62
CPCG06V30/40G06V30/418G06V10/751
Inventor 孟晓静高宝庆王战波
Owner HANVON CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products