Web page parsing method, device, storage medium, processor and equipment
A web page parsing and processor technology, applied in the computer field, can solve problems affecting work efficiency and achieve the effect of improving work efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
example 1-1
[0053] In Example 1-1, the meaning of each node name is: the xpath field is the node attribute value type, Video, Title, and ViewCount are the specific business field types, and the attribute result value matched by Videos is the content in the form of an Xpath array.
[0054] In addition, when configuring each template in the database, it is also necessary to determine the storage format of each template in the database. In order to facilitate the rapid management of each template in the database and achieve the purpose of quickly searching for templates, the storage format of each template in the database can adopt a columnar storage format that supports nested structures, and its storage columns are divided into domain name, business There are three columns of scene and template object, wherein the template object specifically includes the URL regular matching rule of the template and the template content. That is to say, the storage format of each template in the database ...
example 1-2
[0065] After the webpage to be parsed is parsed by using the parsing rules in the found template, the parsing result can be directly fed back to the caller. The web page parsing method disclosed in this embodiment is a stateless service, that is, the web page parsing method disclosed in this embodiment does not change according to the change of the caller.
[0066] It can be seen from the relevant description of this embodiment above that the web page parsing method provided by this embodiment can pre-configure various parsing rules, so when parsing web pages of different websites and different layouts on the same platform, for each web page, all The matching analysis rules can be directly retrieved from the pre-configured analysis rules to analyze the webpage without restarting the online program to complete the configuration of the matching analysis rules, thus improving the work efficiency.
[0067] In addition, it should be noted that, whether it is the web page ana...
example 1-3
[0080] Obviously, the output results in Example 1-3 are the final desired parsing results.
[0081] Corresponding to the above method embodiments, the present invention also provides a web page parsing device.
[0082] Such as Figure 4 As shown, a web page parsing device provided in an embodiment of the present invention includes:
[0083] The preprocessing unit 100 is configured to pre-configure each template, wherein the template content of the template includes parsing rules, and different templates have different parsing rules;
[0084] The obtaining unit 200 is configured to obtain a web page analysis request, wherein the web page analysis request carries the URL of the web page to be parsed and the business scenario where the web page to be parsed is analyzed;
[0085] A search unit 300, configured to search for a template that matches both the business scenario and the URL from pre-configured templates;
[0086] The first parsing unit 400 is configure...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


