Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for integrating extracted web table data

A form data and form technology, which is applied in the field of integrating extracted web form data, can solve problems such as inconsistent forms and singleness, and achieve the effects of improving accuracy, reducing costs, and improving work efficiency

Active Publication Date: 2016-06-15
江苏省现代企业信息化应用支撑软件工程技术研发中心
View PDF4 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Traditional methods generally use schema mapping or data cleaning to solve this problem, however, in web form integration, simple serial processing of schema mapping and data cleaning does not work well
Existing schema mapping methods do not explicitly assume the existence of dirty data, and most data correction and conflict resolution algorithms focus on a single inconsistent table

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for integrating extracted web table data
  • Method and device for integrating extracted web table data
  • Method and device for integrating extracted web table data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0064] The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0065] In order to make the advantages of the technical solution of the present invention clearer, the present invention will be described in detail below in conjunction with the accompanying drawings and embodiments.

[0066] An embodiment of the present invention provides a method for integrating extracted Web form data, a device for integrating extracted Web form data, and the device is a device or device having the function of implementing the m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and device for integrating extracted web table data. The method and device for integrating extracted web table data can improve accuracy of a database formed by integrating the web table data. The method comprises the steps of obtaining a web table corpus; obtaining candidate semantics of each table; calculating inconsistency between the candidate semantics of each table and data semantics; if the inconsistency is larger than a first predetermined threshold, indicating that the candidate semantics of the table is incorrect, utilizing crowdsourcing to confirm the candidate semantics of the table, re-calculating the candidate semantics of the table according to semantic likelihood of the table and crowdsourcing feedback results, and re-calculating the inconsistency; if the inconsistency is larger than a second predetermined threshold and smaller than the first predetermined threshold, utilizing a knowledge base and a crowdsourcing mode to confirm the correctness of data in the table, and re-calculating the inconsistency; if the inconsistency is smaller than the second predetermined threshold, indicating that the candidate semantics of the table is correct, and labeling data in the table; and when it is determined that candidate semantics of all tables is correct, performing mode mapping and data cleaning.

Description

technical field [0001] The invention belongs to the field of Web data processing, in particular to a method and device for integrating extracted Web form data. Background technique [0002] With the rapid development of the Internet, the data information on the Web has become a huge information warehouse. Extracting information from the Web can help people find information quickly and accurately, speed up the speed of obtaining information, and improve work efficiency. For example, restaurant names, types of dishes, prices, etc. on different websites can be extracted. However, there are many quality problems in integrating the extracted Web form data to establish a unified database. For example, the form itself seldom clearly describes the semantics of each form, and the header row only exists in a few cases. Even if there is a header row, Their column names are also sometimes meaningless or unreliable. At the same time, web forms often contain erroneous and inconsistent d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/212G06F16/217
Inventor 鲜学丰赵朋朋崔志明
Owner 江苏省现代企业信息化应用支撑软件工程技术研发中心
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products