Unlock instant, AI-driven research and patent intelligence for your innovation.

Web page parsing method, device, storage medium, processor and equipment

A web page parsing and processor technology, applied in the computer field, can solve problems affecting work efficiency and achieve the effect of improving work efficiency

Active Publication Date: 2021-11-30
BEIJING GRIDSUM TECH CO LTD
View PDF9 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, since the online program needs to be restarted every time to complete the configuration of new analysis rules, and when there are a large number of analysis rules that need to be newly configured, restarting the online program again and again will inevitably affect work efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Web page parsing method, device, storage medium, processor and equipment
  • Web page parsing method, device, storage medium, processor and equipment
  • Web page parsing method, device, storage medium, processor and equipment

Examples

Experimental program
Comparison scheme
Effect test

example 1-1

[0053] In Example 1-1, the meaning of each node name is: the xpath field is the node attribute value type, Video, Title, and ViewCount are the specific business field types, and the attribute result value matched by Videos is the content in the form of an Xpath array.

[0054] In addition, when configuring each template in the database, it is also necessary to determine the storage format of each template in the database. In order to facilitate the rapid management of each template in the database and achieve the purpose of quickly searching for templates, the storage format of each template in the database can adopt a columnar storage format that supports nested structures, and its storage columns are divided into domain name, business There are three columns of scene and template object, wherein the template object specifically includes the URL regular matching rule of the template and the template content. That is to say, the storage format of each template in the database ...

example 1-2

[0065] After the webpage to be parsed is parsed by using the parsing rules in the found template, the parsing result can be directly fed back to the caller. The web page parsing method disclosed in this embodiment is a stateless service, that is, the web page parsing method disclosed in this embodiment does not change according to the change of the caller.

[0066] It can be seen from the relevant description of this embodiment above that the web page parsing method provided by this embodiment can pre-configure various parsing rules, so when parsing web pages of different websites and different layouts on the same platform, for each web page, all The matching analysis rules can be directly retrieved from the pre-configured analysis rules to analyze the webpage without restarting the online program to complete the configuration of the matching analysis rules, thus improving the work efficiency.

[0067] In addition, it should be noted that, whether it is the web page ana...

example 1-3

[0080] Obviously, the output results in Example 1-3 are the final desired parsing results.

[0081] Corresponding to the above method embodiments, the present invention also provides a web page parsing device.

[0082] Such as Figure 4 As shown, a web page parsing device provided in an embodiment of the present invention includes:

[0083] The preprocessing unit 100 is configured to pre-configure each template, wherein the template content of the template includes parsing rules, and different templates have different parsing rules;

[0084] The obtaining unit 200 is configured to obtain a web page analysis request, wherein the web page analysis request carries the URL of the web page to be parsed and the business scenario where the web page to be parsed is analyzed;

[0085] A search unit 300, configured to search for a template that matches both the business scenario and the URL from pre-configured templates;

[0086] The first parsing unit 400 is configure...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a web page analysis method, device, storage medium, processor and equipment. The web page analysis method includes: obtaining a web page analysis request, wherein the web page analysis request carries the URL of the web page to be analyzed and analyzes the The business scene where the webpage is to be parsed; from the pre-configured templates, find a template that matches the business scene and the URL at the same time, wherein the template content of the template includes parsing rules, and different templates have different Parsing rules: using the parsing rules in the found template to parse the webpage to be parsed to obtain a parsing result. The invention can complete the configuration of the analysis rules without restarting the online program, thus improving the working efficiency.

Description

technical field [0001] The present invention relates to the field of computer technology, and more specifically, to a method, device, storage medium, processor and equipment for parsing a web page. Background technique [0002] Web page parsing refers to analyzing and extracting the really desired information from the source code of the web page. Web page analysis technology is a very important part in search engine development. [0003] Each webpage of different websites and different layouts usually corresponds to different parsing rules. To realize the parsing of webpages of different websites and different pages on the same platform, the webpage parsing method currently adopted is: when parsing each webpage, the configuration of parsing rules corresponding to the webpage must be completed first, and then the webpage can be used The parsing rule parses the webpage, and starts parsing the next webpage after parsing the webpage. Among them, each time a new analysis rule ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/955
CPCG06F16/9566
Inventor 袁园
Owner BEIJING GRIDSUM TECH CO LTD