A form extraction method and device

A table and table processing technology, applied in the field of web page parsing, can solve problems such as inability to train models and inadvisability

Active Publication Date: 2020-06-30
CHINA MOBILE COMM GRP CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] When there is no labeled data or insufficient labeled data, supervised methods cannot train an appropriate model, so it is not advisable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A form extraction method and device
  • A form extraction method and device
  • A form extraction method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] Because the importance of the knowledge base is becoming more and more prominent, a lot of knowledge is hoped to be transformed into triples and stored in the knowledge base. When people build a knowledge base, a common method is to obtain knowledge from the table, that is, to extract the table, including the extraction of the table header and the attribute alignment of the table content.

[0048] Since many knowledge tables were not created for the purpose of building a knowledge base when they were originally designed, many aspects cannot be directly used in the knowledge base. For example, at the very beginning, in platforms such as Baidu Encyclopedia and Wikipedia that accumulated knowledge in the form of "crowdsourcing", tables were designed by various users, and their tables were also different in thousands of ways, but the tables were different. A very powerful form of expression of much important knowledge, to which we must give great attention. A lot of relati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a table extracting method. The table extracting method comprises the steps that the content of a source table is read, at least one two-dimensional table is stored according to the content of the source table, the header of the source table is read and extracted according to the number of lines of the header, header items are determined according to the extracted header, a table processing model is established according to the two-dimensional tables, and table content in the table processing model is aligned with the header items through content similarities. The invention further discloses a table extracting device.

Description

technical field [0001] The invention relates to web page (Web) parsing technology, in particular to a table extraction method and device. Background technique [0002] As an important form of information representation, tables have been widely used in Web documents. According to statistics, about 52% of Web pages contain tables. For the table, the syntactic and semantic concepts in the table are mixed with each other, and the logical cell of the table obtains semantics with its relative position information. Therefore, how to make the machine accurately extract form information has always been a challenging problem. Moreover, the table is an important knowledge carrier, and the table has semi-structured characteristics compared with completely unstructured data. If the table can be correctly extracted, it will make a great contribution to the future structured knowledge. [0003] At present, most of the data tables on the Web are described in HTML language, which lacks a d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/957
CPCG06F16/9577
Inventor 周文辉冯俊兰黄毅杨文漪施瑶杨瑞兵邵超
Owner CHINA MOBILE COMM GRP CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products