Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Network table structure identification method and device, computer device and computer readable storage medium

A network form and identification method technology, which is applied in computing, digital data information retrieval, special data processing applications, etc., can solve problems such as low accuracy rate, complex network form, and low algorithm accuracy rate, and achieve the effect of improving accuracy

Pending Publication Date: 2021-08-17
湖南四方天箭信息科技有限公司
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, most of the existing table structure recognition algorithms are aimed at simple tables in a specific field, and the existing recognition algorithms usually identify the table structure in units of table rows, but the network tables in real scenarios are extremely complex, and within a table row Table header cells and table body cells may appear at the same time. Simply dividing the table structure for table rows cannot meet the needs of real scenarios, resulting in low accuracy of existing table structure recognition algorithms.
At the same time, the existing table structure recognition algorithm can only be applied to specified fields, and it is difficult to migrate to other fields. Among them, when the rule-based table extraction algorithm is migrated to other fields, it is necessary to ask experts in the field to re-specify the rules. Algorithm development The cycle is long and the accuracy of the algorithm is generally not high, and when the table extraction algorithm based on machine learning is migrated to other fields, the data needs to be re-labeled, which requires a lot of manpower and time costs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Network table structure identification method and device, computer device and computer readable storage medium
  • Network table structure identification method and device, computer device and computer readable storage medium
  • Network table structure identification method and device, computer device and computer readable storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0047] Such as figure 1 As shown, the preferred embodiment of the present invention provides a kind of identification method of network table structure, comprises the following process:

[0048] Step S1: input HTML file;

[0049] Step S2: preprocessing the input HTML file to obtain table-related information therein, the table-related information including cell text and cell location;

[0050] Step S3: using the trained network table structure recognition model to identify the network table structure based on the obtained table-related information;

[0051] Step S4: Outputting the id...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a network table structure identification method and device, a computer device and a computer readable storage medium, and the method comprises the steps: preprocessing an input HTML file to obtain table related information in the HTML file, the table related information comprising a cell text and the position of a cell, then, using the trained network table structure recognition model for recognizing the network table structure on the basis of the obtained cell text and the position where the cell is located, and since the cell is adopted as the minimum recognition granularity, compared with an existing mode that a table line is adopted as a recognition unit, the accuracy of a recognition result is greatly improved, and the recognition efficiency is improved. And the method can adapt to various complex network table structure identification scenes.

Description

technical field [0001] The present invention relates to the technical field of form information extraction, in particular to a method and device for identifying a network form structure, a computer device and a computer-readable storage medium. Background technique [0002] As an important form of information representation, web tables widely exist in web documents, which store a large amount of high-value information. However, due to the lack of clear semantic information and complex and diverse table structures, it is difficult for computers to accurately analyze the table content. understand. Therefore, the research on information extraction from web forms is of great significance. The table structure recognition is one of the research hotspots in the field of table information extraction. It specifically refers to the analysis of the table structure, the division of the table area, etc., such as including header area identification, table body area identification, etc. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/279G06F16/335G06F16/901G06F40/216G06F40/242G06F40/30
CPCG06F40/279G06F40/216G06F40/242G06F40/30G06F16/9024G06F16/335
Inventor 王志斌段炼周忠诚彭文凯黄九鸣张圣栋
Owner 湖南四方天箭信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products