Method for extracting data of DeepWeb response webpage

A technology that responds to pages and data extraction. It is used in electrical digital data processing, special data processing applications, instruments, etc. It can solve the problems of low recognition efficiency and low extraction accuracy, and achieve the effects of high precision, improved efficiency and strong applicability.

Inactive Publication Date: 2009-11-18
NORTHEASTERN UNIV
View PDF0 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Its recognition efficiency is lower than the recognition method of directly analyzing page do

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for extracting data of DeepWeb response webpage
  • Method for extracting data of DeepWeb response webpage
  • Method for extracting data of DeepWeb response webpage

Examples

Experimental program
Comparison scheme
Effect test
No Example Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a method for extracting data of a DeepWeb response webpage, and belongs to the field of deep web data management. The method comprises the following steps: (1) extracting a DeepWeb response webpage Page; inputting a keyword Key on the inquiry webpage to inquire and acquire the response webpage Page; (2) extracting information of a webpage template, finding a parental node P with maximum number Wn of child nodes including the keyword, converting a token block sequence with a label into a token character sequence with the label; processing the two recorded token character sequences with the labels by an LCS algorithm, separating and filtering a public token character sequence to acquire template information; (3) extracting data; (4) combining the token blocks; and (5) clustering a data table. The method has the advantages of strong applicability, high precision and greatly improved efficiency.

Description

technical field [0001] The invention belongs to the field of deep network data management, in particular to a method for extracting DeepWeb response page data. Background technique [0002] With the development of the Web, the information on the Web has exploded. The Web can be divided into Surface Web and Deep Web (deep network) according to the depth of information it contains. Surface Web refers to a collection of pages that can be indexed by traditional search engines through hyperlinks; Deep Web refers to the part of the Web that cannot be indexed by traditional search engines, and can only be found by dynamically submitting queries through the query interface. Access content on the web. As the number of Deep Web data sources increases, its importance is becoming more and more prominent. Because these data sources contain a large amount of high-quality structured information. However, these data sources can only be accessed through their query interfaces, and finall...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 申德荣于戈孙高尚聂铁铮寇月王振华
Owner NORTHEASTERN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products