Automatic extraction method oriented to data of deep web pages
A technology of web page data and page data, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as low efficiency, low accuracy, difficulty in wrapper generation and maintenance, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0040] Below in conjunction with accompanying drawing and embodiment the present invention will be further described:
[0041] Such as figure 1 As shown, a method for automatic extraction of deep web page data is carried out according to the following steps:
[0042] S1. Obtain two deep web pages of the same site, marked as page one and page two respectively; use the HTML Tidy conversion tool to convert the HTML documents of page one and page two into XHTML documents;
[0043] S2. Perform noise removal processing on page 1 and page 2;
[0044] S3. Perform duplicate mode elimination processing on page 1 and page 2;
[0045] S4, generating a web page data extraction wrapper;
[0046] S5, the page of the data to be extracted Perform noise removal processing;
[0047] S6. The web page data extraction wrapper first marks the pages after denoising in step S5, and then extracts the marked pages;
[0048] The repeated pattern elimination process described in step S3 is carried ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 