HTML table semantic venation analysis method

A technology of semantic analysis and tables, which is applied in the fields of instruments, computing, and electrical digital data processing, etc., can solve problems such as difficulties in automatic methods, complexity of semantic relations in tables, concepts without scope, etc., and achieve simple identification, high query efficiency, The effect of high recall

Inactive Publication Date: 2011-05-04
HUAZHONG UNIV OF SCI & TECH
View PDF1 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

One problem is that due to the high complexity of the semantic relationship of the table, it is difficult to design an automatic method that can completely correctly identify the success. For example, the literature [7]

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • HTML table semantic venation analysis method
  • HTML table semantic venation analysis method
  • HTML table semantic venation analysis method

Examples

Experimental program
Comparison scheme
Effect test

Example Embodiment

[0039] detailed description

[0040] Such as figure 1 As shown, some commonly used tables on the Internet and reference documents are used as the explanatory object, VC++ is used as the development language, and the TableToSS system developed by the inventor is used as the basis. According to the principle of the HTML table semantic context analysis method disclosed in the present invention And the implementation scheme is:

[0041] Step 101: Create a table coordinate system.

[0042] The following definition is given with reference to the relationship theory method:

[0043] Definition 1: Given a set of domain D 1 , D 2 ,..., D n , Their Cartesian product is D 1 ×D 2 ×…×D n ={d 1 , D 2 ,...,D n )|d i ∈ D i , I=1, 2,..., n}, and each domain D i The domain name and Cartesian product of is evenly written in some two-dimensional rectangular grids neatly arranged in the horizontal and vertical directions, and each row of each grid is the same height, and each column is the same width. T...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a hypertext markup language (HTML) table semantic analysis method, which is applied to the retrieval of a webpage document. A semantic venation relation of an HTML table can be acquired according to geometric position relations among different cells in the HTML table. The cells are described by defining a table coordinate system and a table matrix, and attributes in the table and attribute dereferencing characteristics are determined by defining a column or row assembly unit and taking the column or row assembly unit as an analysis object. An action scope of each attribute is set and the attribute and an attribute dereferencing recognition rule are established by analyzing the geometrical position relations among the cells, so that table cells can be traversed by the table matrix and the relations among all the cells can be established to form a table semantic venation tree and provide supports for the retrieval of the document. The method accords with the tabulation and reading habits of people; the algorithm is simple; and only a table content tree needs recording, but is not required to be developed to form a body node or database data, so that the memory space is greatly saved.

Description

technical field [0001] The invention relates to a semantic analysis method of an HTML table, which can be especially applied in the retrieval of webpage documents, and an association relationship can be established according to the geometric positional relationship between different cells in the HTML table, so as to establish a data content in the table semantic context. Background technique [0002] Tables are a common means for people to express semantic relations in a structured way, which can effectively describe specific instances of one or more classes, so tables are widely used in various documents. With the development of Internet technology and the popularization of applications, tables constructed in forms such as hypertext markup language (HTML) are widely used in web pages. For humans, it is relatively clear to use tables to express semantic relationships, but due to human intelligence, simplicity, and irregularity, many tables contain very complex semantic rela...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 尹文生
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products