Document processing method and device

A document processing and document technology, applied in the field of electronic information, can solve the problem that the document vector cannot be accurately obtained, and achieve the effect of accurate final representation vector

Pending Publication Date: 2022-07-29
ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, the current document processing methods cannot accurately obtain the vector representing the information in the document

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document processing method and device
  • Document processing method and device
  • Document processing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] The solution provided in this specification will be described below with reference to the accompanying drawings.

[0054] figure 1 It is a flow chart of a document processing method in an embodiment of this specification. The execution body of the method is a document processing device. It can be understood that the method can also be performed by any apparatus, device, platform, or device cluster with computing and processing capabilities. see figure 1 , the method includes:

[0055] Step 101: Extract at least two text blocks from the document to be processed.

[0056] Step 103: Take each text block as a node, and obtain at least one feature of each node.

[0057] Step 105: Obtain an initial characterization vector of each node according to at least one feature of each node.

[0058] Step 107: Obtain the final characterization vector of each node according to the initial characterization vector of each node and the positional relationship between the text block c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a document processing method and device. The method comprises the following steps: extracting at least two text blocks from a to-be-processed document; taking each text block as a node, and obtaining at least one feature of each node; according to at least one feature of each node, obtaining an initial representation vector of the node; and obtaining a final representation vector of each node according to the initial representation vector of each node and the position relationship between the text block corresponding to the node and the text blocks corresponding to other nodes in the to-be-processed document. According to the embodiment of the invention, the vector representing the information in the document can be more accurately obtained.

Description

technical field [0001] One or more embodiments of this specification relate to electronic information technology, and in particular, to a document processing method and apparatus. Background technique [0002] In various types of documents, a large amount of information can be included. In order to make use of this information, various types of documents need to be structured to obtain a vector that can represent the information in the document. For example, whether in enterprises or government agencies, a large amount of information is stored in unstructured or semi-structured documents such as paper documents, emails, pictures, and PDF documents, and these documents need to be converted into structured data. Calculate the various vectors used to represent the information in the document, such as name, age, ID number, etc., so that these vectors can be used for automatic computer processing of subsequent businesses, such as the electronic files in government agencies, insu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/30
CPCG06F16/30
Inventor 施登亮郝嘉然祝慧佳刘思亮
Owner ALIPAY (HANGZHOU) INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products