Table extracting method and device

A form, form processing technology

Active Publication Date: 2016-11-23
CHINA MOBILE COMM GRP CO LTD
View PDF4 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] When there is no labeled data or insufficient labeled data, supervised methods cannot train an appropriate model, so it is not advisable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Table extracting method and device
  • Table extracting method and device
  • Table extracting method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] As the importance of the knowledge base is becoming more and more prominent, a lot of knowledge is hoped to be converted into triples and stored in the knowledge base. When people build a knowledge base, a common method is to obtain knowledge from tables, that is, to extract the table, including the extraction of the table header and the attribute alignment of the table content.

[0048] Since many knowledge forms were not created for the purpose of establishing a knowledge base when they were originally designed, the knowledge base cannot be directly used in many aspects. For example, at the very beginning, in Baidu Baike, Wikipedia and other platforms that accumulated knowledge in the form of "crowdsourcing", tables were designed by various users, and their tables were very different, but the tables were A very powerful expression of many important knowledge, we must pay high attention to it. The knowledge of many relational databases is also presented in the form of ta...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a table extracting method. The table extracting method comprises the steps that the content of a source table is read, at least one two-dimensional table is stored according to the content of the source table, the header of the source table is read and extracted according to the number of lines of the header, header items are determined according to the extracted header, a table processing model is established according to the two-dimensional tables, and table content in the table processing model is aligned with the header items through content similarities. The invention further discloses a table extracting device.

Description

Technical field [0001] The present invention relates to web page (Web) parsing technology, in particular to a method and device for table extraction. Background technique [0002] As an important form of information expression, tables have been widely used in Web documents. According to statistics, about 52% of Web pages contain tables. For tables, the syntactic and semantic concepts in the table are mixed with each other, and the logical cells of the table obtain the semantics by their relative position information. Therefore, how to get the machine to accurately extract table information has always been a challenging problem. Moreover, the table is an important knowledge carrier, and the table has a semi-structured characteristic relative to completely unstructured data. If the table can be extracted correctly, it will contribute a lot to the future structured knowledge. [0003] At present, most of the data tables on the Web are still described in HTML language, which lacks th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/9577
Inventor 周文辉冯俊兰黄毅杨文漪施瑶杨瑞兵邵超
Owner CHINA MOBILE COMM GRP CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products