Web page text extraction method and device
A text and webpage technology, which is applied in the field of webpage text extraction methods and devices, can solve the problems of text node errors, no impurity information filtering, etc., and achieves the effect of high accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
no. 3 Embodiment
[0098] Also, refer to Figure 4 , this figure is a flow chart of the third specific embodiment according to the web page text extraction method of the present invention, and the main steps of this embodiment are as follows:
[0099] Step S21, determining the text node of the webpage with the same domain name, which specifically includes: obtaining a plurality of sample webpages of the webpage with the same domain name; comparing the webpage structures of the plurality of sample webpages to determine the text node of the webpage with the same domain name;
[0100] The webpage structures of the webpages with the same domain name are actually similar. Therefore, this embodiment can determine the text nodes of the webpages with the same domain name through the webpage structure. Taking a specific example, the webpage structures of multiple sample webpages are compared The text node of the webpage with the same domain name can be determined, for example, in the following manner: ar...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


