Method, device and system for extracting web content
A technology of webpage content and extraction method, applied in the field of the Internet, can solve the problems of prone to errors, inefficient browsing of webpages, and taking a long time, so as to avoid errors, improve extraction efficiency, and save time.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0043] The embodiment of the present invention provides a method for extracting webpage content. The method can be applied to a terminal installed with a browser. The terminal includes but is not limited to a mobile phone, a computer, a tablet computer, etc. The specific form of the terminal is not discussed in this embodiment. limited. To implement this method from the perspective of the terminal as an example, see figure 1 , the method flow provided by this embodiment includes:
[0044] 101: Obtain the webpage to be extracted, and determine whether there is an extraction rule locally stored for extracting the webpage content of the webpage to be extracted according to the URL of the webpage to be extracted;
[0045] According to the URL of the webpage to be extracted, it is determined whether there is an extraction rule for extracting the webpage content of the webpage to be extracted locally, including:
[0046] Determine the root domain name contained in the URL of the w...
Embodiment 2
[0065] The embodiment of the present invention provides a method for extracting webpage content. Combining the content of the first embodiment above, this embodiment executes the method for extracting webpage content on a terminal installed with a browser, and the execution subject is the webpage installed on the terminal. Taking a browser as an example, the method provided in this embodiment is illustrated. see image 3 , the method flow provided by this embodiment includes:
[0066] 301: Obtain the webpage to be extracted, and determine the root domain name included in the URL of the obtained webpage to be extracted;
[0067] Specifically, this embodiment does not limit the method of obtaining the webpage to be extracted, including but not limited to the browser obtaining the web address of the webpage to be extracted, and then sending a request for obtaining the webpage to be extracted to the server, and receiving the webpage returned by the server according to the request...
Embodiment 3
[0092] The embodiment of the present invention provides a method for extracting web page content, see Figure 4 , the method flow provided by this embodiment includes:
[0093] 401: Obtain the webpage to be extracted, and determine whether there is an extraction rule for extracting the webpage content of the webpage to be extracted locally according to the URL of the webpage to be extracted;
[0094] Specifically, the implementation principle of this step is the same as the implementation principle of step 301 in the above-mentioned embodiment 2. For details, refer to the content of step 301 in the above-mentioned embodiment 2, which will not be repeated here.
[0095] 402: If it is determined that the local storage has an extraction rule for extracting the webpage content of the webpage to be extracted, then determine whether the locally stored extraction rule has expired, if yes, perform step 403, if not, perform step 406;
[0096] For this step, considering the timeliness ...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


