Text structured information extraction method, server and storage medium

A text structure and information extraction technology, applied in the field of data processing, can solve the problems of large randomness of format and text position, and inability to obtain structured information conveniently, so as to avoid manual processing.

Pending Publication Date: 2019-09-27
ONE CONNECT SMART TECH CO LTD SHENZHEN
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of the above, the present invention provides a method for extracting text structured information, a server and a storage medium, the purpose of which is to solve the problem that when extracting document information, there is a large randomness in the format and text position, and it is impossible to obtain structured information conveniently

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text structured information extraction method, server and storage medium
  • Text structured information extraction method, server and storage medium
  • Text structured information extraction method, server and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050]In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0051] refer to figure 1 Shown is an application environment diagram of a preferred embodiment of the text structured information extraction method of the present invention. The server 1 is installed with a text structured information extraction program 10 . A plurality of clients 3 are connected to the server 1 through the network 2 . The network 2 may be the Internet, a cloud network, a ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a data processing technology, and provides a text structured information extraction method, a server and a storage medium. The method comprises the following steps: firstly, obtaining an original document of to-be-extracted structured information, inputting the original document into a trained first segmentation model to obtain a plurality of first-level tags of the original document, and obtaining first-level text contents corresponding to the first-level tags according to a first preset rule; and then inputting each first-level text content into the trained second segmentation model to obtain a plurality of second-level tags, then obtaining second-level text contents corresponding to each second-level tag according to a second preset rule, storing each obtained tag and text content as logic pages in a preset database, and generating corresponding files to feed back the files to the client. Each first-level tag and each second-level tag in the original document are determined by utilizing the segmentation model, then the structural information is extracted according to the tag content, the extraction of the structured information of the document is automatically realized, and the method is convenient and efficient.

Description

technical field [0001] The invention relates to the field of data processing, in particular to a method for extracting text structured information, a server and a storage medium. Background technique [0002] Portable Document Format (PDF) is used to exchange files in a way independent of applications, operating systems, and hardware. It belongs to a format document and will faithfully reproduce every character, color, and image of the original manuscript, but the storage of PDF is very important. The structured data storage format does not record logical elements such as the logical structure and tables of the document. [0003] At present, optical character recognition (Optical Character Recognition, OCR) technology is usually used to extract the information of PDF documents, but the information of PDF documents extracted by OCR technology is rendered in the form of vectors, and there is no space between each character. Logically, the text formed by the extracted characte...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/00G06F16/22
CPCG06F16/2291G06V30/416
Inventor 韦峰徐国强邱寒
Owner ONE CONNECT SMART TECH CO LTD SHENZHEN
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products