Method, device and system for extracting web content

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of webpage content and extraction method, applied in the field of the Internet, can solve the problems of prone to errors, inefficient browsing of webpages, and taking a long time, so as to avoid errors, improve extraction efficiency, and save time.

Active Publication Date: 2019-02-12

TENCENT TECH (SHENZHEN) CO LTD

View PDF5 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

Therefore, the above process is prone to errors and takes a long time, resulting in low efficiency for users to browse web pages

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0043] The embodiment of the present invention provides a method for extracting webpage content. The method can be applied to a terminal installed with a browser. The terminal includes but is not limited to a mobile phone, a computer, a tablet computer, etc. The specific form of the terminal is not discussed in this embodiment. limited. To implement this method from the perspective of the terminal as an example, see figure 1 , the method flow provided by this embodiment includes:

[0044] 101: Obtain the webpage to be extracted, and determine whether there is an extraction rule locally stored for extracting the webpage content of the webpage to be extracted according to the URL of the webpage to be extracted;

[0045] According to the URL of the webpage to be extracted, it is determined whether there is an extraction rule for extracting the webpage content of the webpage to be extracted locally, including:

[0046] Determine the root domain name contained in the URL of the w...

Embodiment 2

[0065] The embodiment of the present invention provides a method for extracting webpage content. Combining the content of the first embodiment above, this embodiment executes the method for extracting webpage content on a terminal installed with a browser, and the execution subject is the webpage installed on the terminal. Taking a browser as an example, the method provided in this embodiment is illustrated. see image 3 , the method flow provided by this embodiment includes:

[0066] 301: Obtain the webpage to be extracted, and determine the root domain name included in the URL of the obtained webpage to be extracted;

[0067] Specifically, this embodiment does not limit the method of obtaining the webpage to be extracted, including but not limited to the browser obtaining the web address of the webpage to be extracted, and then sending a request for obtaining the webpage to be extracted to the server, and receiving the webpage returned by the server according to the request...

Embodiment 3

[0092] The embodiment of the present invention provides a method for extracting web page content, see Figure 4 , the method flow provided by this embodiment includes:

[0093] 401: Obtain the webpage to be extracted, and determine whether there is an extraction rule for extracting the webpage content of the webpage to be extracted locally according to the URL of the webpage to be extracted;

[0094] Specifically, the implementation principle of this step is the same as the implementation principle of step 301 in the above-mentioned embodiment 2. For details, refer to the content of step 301 in the above-mentioned embodiment 2, which will not be repeated here.

[0095] 402: If it is determined that the local storage has an extraction rule for extracting the webpage content of the webpage to be extracted, then determine whether the locally stored extraction rule has expired, if yes, perform step 403, if not, perform step 406;

[0096] For this step, considering the timeliness ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A web content extraction device or system may obtain a to-be-extracted webpage, determine whether an extraction rule for extracting web content of the to-be-extracted webpage is locally stored; if it is determined that the extraction rule for extracting the web content of the to-be-extracted webpage is locally stored, request, from a server, an extraction rule for extracting the web content of the to-be-extracted webpage; receiving a unified extraction rule from the server; accessing a third party resolver library for resolving the unified extraction rule after determining that resolution of the unified extraction rule is not supported; resolving the unified extraction rule via the third party resolver library; and extracting the web content of the to-be-extracted webpage according to the resolved unified extraction rule.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a method, device and system for extracting web page content. Background technique [0002] With the rapid development of Internet technology, more and more network applications are based on B / S architecture (Browser / Server, browser / server mode). Under the B / S architecture, it is not necessary to install the corresponding client on the terminal, and different functions can be realized directly through the browser. Common B / S architecture network applications such as web games, online video, online music, etc. In this type of network application, the server needs to send the web page content and extraction rules corresponding to the network application to the terminal. After the browser installed on the terminal obtains the webpage content and the extraction rules sent by the server, it usually needs to extract the webpage content according to the obtained extraction rules. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F16/951

CPCG06F16/957

Inventor 张锐杰

Owner TENCENT TECH (SHENZHEN) CO LTD

Method, device and system for extracting web content

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology