Web page denoising method and system based on cooperative work of template and classifier
A collaborative work and classifier technology, applied in the fields of instrumentation, computing, electrical digital data processing, etc., can solve the problems of inability to use webpage denoiser, influence of denoising effect, low efficiency, etc., and achieve wide adaptability and good denoising effect. , the effect of fast processing
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0029] The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.
[0030] Such as figure 1 As shown, the denoising method of the present invention comprises the following steps:
[0031] 1. Obtain the original HTML document through web crawler technology, including web page download and web page discovery. Among them, the webpage download is responsible for downloading the target webpage and storing it in the database according to the domain name address of the target webpage; the webpage discovery is responsible for finding the new webpage address that meets the requirements and adding it to the list to be crawled.
[0032] Second, process the original HTML document, including preprocessing and correction. Among them, preprocessing is responsible for deleting tags that do not contain text content, such as comments, scripts, styles, etc.; correction is to correct correctable errors in the DOM tree,...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 

