A document analysis method and device

An analysis method and document technology, which is applied in the directions of instruments, calculations, electrical digital data processing, etc., can solve the problem of low accuracy of document analysis, and achieve the effect of reducing the workload of manual maintenance and achieving high accuracy

Active Publication Date: 2019-04-12
深圳市一览网络股份有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to propose a document analysis method and device to solve the technical problem of low document analysis accuracy in the above-mentioned prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A document analysis method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0054] The present invention proposes a document parsing method for parsing an original document including several matching relationships. The matching relationship here refers to the matching content corresponding to a certain matching item in a certain part of the original document. See figure 1 It is a document parsing flow chart of Embodiment 1 of the present invention, including the following steps:

[0055] S1. Extracting text content from the original document;

[0056] Embodiments of the present invention do not limit the format of the original document, which may be any one of doc, docx, wps, txt, mht, html, htm, pdf or other common format types, and does not perform any modification on the format of the extracted text content. There is no limit, it can be any one of html format content, plain text content or base64 encoded content or other common format types.

[0057] S2. Segment the text content according to the preset segment identifier, put the segmented text co...

Embodiment 2

[0067] The present invention also proposes a resume document parsing method for parsing original documents including several matching relationships. The matching relationship here means that a certain part of the content in the original document is the matching content corresponding to a certain matching item, such as "" in the resume document Name", "gender", "place of residence" are matching items, and "Zhang San", "male", and "Shenzhen" are the matching content corresponding to the above matching items, including the following steps:

[0068] S1. Extracting text content from the original document;

[0069] Embodiments of the present invention do not limit the format of the original document, which may be any one of doc, docx, wps, txt, mht, html, htm, pdf or other common format types, and does not perform any modification on the format of the extracted text content. There is no limit, it can be any one of html format content, plain text content or base64 encoded content or ...

Embodiment 3

[0120] The present invention proposes a document parsing device for parsing an original document with a specific format, where the specific format refers to a matching item and matching content corresponding to the matching item, including a content extraction module, a content stacking module, and a content analysis module; among them,

[0121] A content extraction module for extracting text content from the original document;

[0122] The content stacking module is used to segment the text content according to the segment identifier, put the segmented text content into the original content stack, and store a piece of content at a stack point;

[0123] The content parsing module is used to sequentially take out the stack point content of the original content stack as the current stack point content; if the current stack point content satisfies the matching condition of a keyword corresponding to a matching item, the current stack point is called the current matching stack poi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a document analysis method and apparatus. The document analysis method comprises the following steps of S1, extracting text contents from an original document; S2, segmenting the text contents according to a preset segmentation identifier, and putting the segmented text contents into an original content stack; and S3, extracting stack point contents of the original content stack in sequence as current stack point contents; and if the current stack point contents meet a matching condition of a keyword corresponding to a matching item, calling a current stack point as a current matching stack point, taking the current stack point as a matching starting point of the matching item, taking the contents after the keyword is removed from the current stack point contents and the downwards traversed stack point contents as the matching contents of the matching item, and until a next matching stack point is met, taking a previous stack point of the next matching stack point as a matching stop point of the matching item. The document analysis method can adapt to content analysis of documents in various formats, so that the document analysis precision is improved and the manual maintenance cost is reduced.

Description

technical field [0001] The invention relates to a document parsing method and device. Background technique [0002] Document parsing needs often appear in existing business activities. For example, when recruiting websites input resumes uploaded by applicants, due to the lack of uniform rules for resume content formats, the traditional method requires manual input of resume content item by item, seriously affecting work efficiency. However, the accuracy of the existing document parsing technology is not high. Once there is a slight difference in the document content format, the entire parsing result may be problematic. Therefore, it is necessary to propose a document parsing method with both parsing accuracy and parsing efficiency. Contents of the invention [0003] The purpose of the present invention is to provide a document analysis method and device to solve the technical problem of low document analysis accuracy in the prior art. [0004] For this reason, the presen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27G06F16/332
Inventor 张海东庄秋敏
Owner 深圳市一览网络股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products