Content extraction method of picture table based on computer vision and natural language processing

A technology of natural language processing and computer vision, applied in natural language data processing, computer components, computing, etc., can solve problems such as inability to "understand" tables

Active Publication Date: 2022-03-08
南京跑码地计算技术有限公司
View PDF12 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The purpose of the present invention is to provide a table content extraction method based on computer vision and natural language processing, using border detection, OCR, text classification, etc. Technology, develop table border recognition, table content extraction, table content classification, table layout reasoning modules, realize data extraction from image tables and convert the extracted data into structured data in JSON format

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Content extraction method of picture table based on computer vision and natural language processing
  • Content extraction method of picture table based on computer vision and natural language processing
  • Content extraction method of picture table based on computer vision and natural language processing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] In order to further understand the structure, features and purpose of the present invention, it is now described as follows in conjunction with the accompanying drawings. The implementation illustrated in the drawings is only used to illustrate the technical solution of the present invention, not to limit the present invention.

[0035] Such as figure 1 As shown, the present invention discloses a table content extraction method based on computer vision and natural language processing, including five aspects of table border recognition, cell character recognition, table content classification, table layout reasoning, and structured table data. Proceed as follows:

[0036] Step 1: Input the picture containing the table into the table border recognition model, and recognize the table border in the picture. The recognition of the table border includes three parts: table area detection, cell area detection and table border recognition. Such as figure 2 As shown, the spec...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for extracting the content of a picture table based on computer vision and natural language processing, comprising step 1: inputting the picture into a table frame recognition model, identifying the table frame, and calculating the coordinates of each cell in the table; step 2: extracting the coordinates of each cell Text content; step 3, according to the extracted text content, mark according to three types of key, value, and mixed value, construct a table content classification data set, and train a cell content classification model based on the data set; step 4 according to table coordinates, unit The grid coordinates and the category of each cell text are used to infer the table layout; step 5 organizes the data in the table in JSON format according to the layout information of the table, the content of the cell and the category information. The invention introduces natural language processing technology to mark the content category of each cell in the table, and then combines the cell position information to reason the table layout, and finally output the table content in a structured manner.

Description

technical field [0001] The invention relates to the technical field of table data extraction, in particular to a method for extracting table content from pictures based on computer vision and natural language processing. Background technique [0002] The application of information extraction based on computer vision and natural language processing technology is becoming more and more widespread, such as recognizing text from pictures, extracting entities such as names, place names, and phone numbers from text, and extracting key information from invoices, insurance policies, and other forms. Wait. At the same time, major cloud vendors also provide identification services based on cloud platforms for form data such as bills and contracts. [0003] Existing techniques for extracting tabular data mainly focus on two aspects. First, through traditional image processing methods, such as erosion, expansion, edge detection, contour recognition, etc., first identify the table in t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06V30/412G06V30/414G06V30/19G06K9/62G06F16/35G06F16/31G06F40/289
CPCG06F16/353G06F16/313G06F40/289G06F18/217G06F18/214
Inventor 王国栋
Owner 南京跑码地计算技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products