Network table column type detection method based on probabilistic graph model

A probabilistic graph model and network table technology, applied in database models, neural learning methods, biological neural network models, etc., can solve the problems of reducing matching accuracy, not considering semantic similarity, dirty data is not robust, etc. , to achieve the effect of improving the accuracy

Pending Publication Date: 2022-04-29
NORTHEASTERN UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The existing network table column type detection method has the following limitations: (1) The traditional method is based on cell ontology matching to find the entity corresponding to the cell data contained in the target column in the knowledge base. There are a large part of the cells in the network table in the data set that cannot find the corresponding entities in the knowledge base. On the other hand, the semantic association between the cells is ignored, which leads to the lack of robustness to dirty data; (2 ) Most of the feature engineering-based methods require a variety of manually defined features, the scalability is not strong, and the calculation is time-consuming and labor-intensive. At the same time, because only the vocabulary comparison method is used without considering the similarity at the semantic level, it is prone to ambiguity and reduces the matching (3) Most of the models only consider the characteristics and semantic relationships inside the cells, ignoring the possible associations between cells, and only model the features of a single column, without taking advantage of the special organizational structure of the table and potentially complex semantic relationships between cells

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Network table column type detection method based on probabilistic graph model
  • Network table column type detection method based on probabilistic graph model
  • Network table column type detection method based on probabilistic graph model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] In order to facilitate the understanding of the present application, the specific implementation manners of the present invention will be further described in detail below with reference to the drawings and embodiments. The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention.

[0041] figure 1 It is a schematic diagram of the framework of the network table column type detection method based on the probabilistic graphical model of the present embodiment, figure 2 It is a schematic flow chart of the network table column type detection method based on the probability graph model in this embodiment, as shown in the figure figure 1 and figure 2 As shown, the network table column type detection method based on the probabilistic graphical model comprises the following steps:

[0042] Step 1: Joining web tables: joining the web tables with the same column headings in the web table data set into o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a network table column type detection method based on a probabilistic graph model, and belongs to the field of table interpretation in a semantic network. The method comprises the following steps: splicing tables belonging to the same mode from the same website into a table; performing single-column classification on the spliced table: firstly, dividing columns in the spliced table into numeric type columns and character type columns, and then classifying the numeric type columns and the character type columns respectively; and on the basis of a single-column classification result, a probabilistic graph model is constructed to mine a semantic relationship implied between columns, so that the column type sequence of the whole table is detected. According to the method, the semantic type of the column in the network table can be detected, a good effect is achieved, and compared with other column type detection methods, the accuracy is improved by 10% or above.

Description

technical field [0001] The invention belongs to the field of table interpretation in the semantic web, and mainly relates to a method for detecting column types of a network table based on a probability graph model. Background technique [0002] Web tables use a fixed structure to present their content, provide a compact representation of entities described by attributes and relationships between entities, and are distinct from other types of tables such as layout tables used primarily for formatting purposes or for display in a grid format Unlike the matrix table of digital summaries), its structure contains very valuable relational knowledge. At the same time, compared with unstructured data, its research reduces the workload of extracting and interpreting data. For the above reasons, web forms have received increasing attention from the research community. There is a wealth of knowledge embedded in web forms, and there are many practical use cases that leverage web form...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/30G06F16/28G06F17/16G06K9/62G06N3/04G06N3/08
CPCG06F40/30G06F16/288G06N3/08G06F17/16G06N3/045G06F18/2415
Inventor 申德荣郭彤聂铁铮寇月于戈
Owner NORTHEASTERN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products