Automatic extraction method oriented to data of deep web pages
A technology of web page data and page data, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as low efficiency, low accuracy, difficulty in wrapper generation and maintenance, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0040] Below in conjunction with accompanying drawing and embodiment the present invention will be further described:
[0041] Such as figure 1 As shown, a method for automatic extraction of deep web page data is carried out according to the following steps:
[0042] S1. Obtain two deep web pages of the same site, marked as page one and page two respectively; use the HTML Tidy conversion tool to convert the HTML documents of page one and page two into XHTML documents;
[0043] S2. Perform noise removal processing on page 1 and page 2;
[0044] S3. Perform duplicate mode elimination processing on page 1 and page 2;
[0045] S4, generating a web page data extraction wrapper;
[0046] S5, the page of the data to be extracted Perform noise removal processing;
[0047] S6. The web page data extraction wrapper first marks the pages after denoising in step S5, and then extracts the marked pages;
[0048] The repeated pattern elimination process described in step S3 is carried ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com