A PDF document content text paragraph aggregation method based on a neural network

What is AI technical title?
AI technical title is built by Patsnap AI team. It summarizes the technical point description of the patent document.
A neural network and document technology, applied in the field of neural network-based aggregation of PDF document content text paragraphs, can solve a large number of human resources and other problems, and achieve the effects of saving labor costs, facilitating reuse, and improving efficiency

Active Publication Date: 2019-06-28

武汉汉王数据技术有限公司

View PDF7 Cites 6 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

This requires a lot of human resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0020] In order to facilitate the understanding and implementation of the present invention by those of ordinary skill in the art, the present invention will be further described in detail with reference to the accompanying drawings and embodiments. It should be understood that the implementation examples described here are only used to illustrate and explain the present invention, and are not intended to limit this invention.

[0021] Please see figure 1 , A method for aggregating text paragraphs of PDF document content based on neural network provided by the present invention includes the following steps:

[0022] Step 1: For a number of PDF documents, extract the line text information features of each PDF document;

[0023] In this embodiment, the line text information features include line left margin, line right margin, number of characters, line maximum character height, line minimum character height, line maximum character width, line minimum character width, line maximum char...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a PDF document content text paragraph aggregation method based on a neural network, and the method comprises the steps: defining dozens of features of a row of texts, converting the features into multi-dimensional vectors, generating a sample data set, designing an algorithm model, carrying out the continuous training of the model, and finally outputting the trained algorithm model. For two input lines of texts, the algorithm model is used to accurately determine whether the two lines of texts should be merged into the same paragraph. Based on an artificial intelligencetechnology of a neural network, a research and development application program automatically aggregates line characters extracted from PDF into paragraphs, original sentences and paragraph structureinformation of the characters are restored, and repeated utilization of PDF content data is facilitated. The automatic aggregation efficiency of the artificial intelligence program cannot be achievedthrough manual processing, manual work is replaced by machines, the labor cost is saved, and the efficiency is greatly improved.

Description

Technical field [0001] The invention belongs to the technical field of artificial intelligence, and relates to a method for aggregating text paragraphs of PDF document content, and in particular to a method for aggregating text paragraphs of PDF document content based on a neural network. Background technique [0002] PDF (Portable Document Format) is a file format for presenting documents in a way independent of applications, hardware, and operating systems. This file format has nothing to do with the operating system platform. It can display PDF documents with the same display effect in operating systems such as Windows, Unix and Mac OS. PDF documents support a variety of tools and browsers to open, easy to read, transfer and store, is currently one of the most commonly used document formats. [0003] Although PDF documents can guarantee the same presentation effect, it is not easy to re-edit the published PDF documents. When the PDF document is published, because of the need t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & AuthorityApplications(China)

IPC IPC(8): G06K9/00G06K9/46

CPCY02D10/00

Inventor聂昱

Owner武汉汉王数据技术有限公司

A PDF document content text paragraph aggregation method based on a neural network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology