Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A quick rule customization method based on python QT and intelligent algorithm

A technology of intelligent algorithms and rules, applied in computing, network data navigation, special data processing applications, etc., can solve problems such as time-consuming, labor-intensive, and low efficiency, and achieve the effect of improving efficiency

Active Publication Date: 2019-01-25
USTC SINOVATE SOFTWARE
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] The purpose of the present invention is to provide a fast rule customization method based on pythonQT and intelligent algorithms, and extract the navigation list items in the page through selenium and filter out unconformable labels; meanwhile, extract the text part of the details page through intelligent algorithms, and realize multiple different websites. The customization of web page rules solves the problem that the existing custom web page rules through manual analysis are not applicable to a variety of complicated website page rule customization, which is time-consuming, labor-intensive, and low in efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A quick rule customization method based on python QT and intelligent algorithm
  • A quick rule customization method based on python QT and intelligent algorithm
  • A quick rule customization method based on python QT and intelligent algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0044] see figure 1 As shown, the present invention is a fast rule customization method based on pythonQT and intelligent algorithms, comprising the following steps:

[0045] S00: Enter the URL of the page to be crawled, and the client loads the page through the URL;

[0046] S01: Extract the navigation list items in the page based on selenium;

[0047] S02: Extract the text part of the details page through an intelligent algorithm;

[0048] S03: Obtain page el...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a quick rule customization method based on python QT and intelligent algorithm, which relates to the technical field of web page rule customization. The invention includes inputting URL of a page to be climbed, and loading the page by the client through the URL. extracting navigation list items based on selenium; extracting the body of detail page by intelligent algorithm;obtaining page element rules from page by js technology and returning to client; uploading the rules to the server, and the background crawler performing crawling according to the rules. The inventionextracts navigation list items in the page through selenium and filters out (a) labels whose ordinate is greater than the height of the browser and (a) labels whose abscissa is the same and whose number is less than the reference value. After that, the detailed page text is extracted by intelligent algorithm, which avoids the problem that the customized web page rules do not apply to many kinds of complicated web page rules by manual analysis, and is suitable for customizing web page rules with different websites, so as to improve the efficiency of customizing web page rules.

Description

technical field [0001] The invention belongs to the technical field of webpage rule customization, in particular to a fast rule customization method based on pythonQT and intelligent algorithms. Background technique [0002] With the rapid development of big data technology, data, as its fundamental research object, is playing an increasingly important role. How to obtain data efficiently and quickly has become one of the important topics of current research. As the basic technical means of Internet data acquisition, crawlers can efficiently acquire data and optimize and improve the current technology is imperative. At present, the basic idea for crawlers to obtain web page data is: [0003] (1) Through the given address of the target webpage, the crawler initiates a request for the page, that is, sends a Request, and the request can contain additional hearder and other information. [0004] (2) Obtain the content of the response after requesting the server. If the serve...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F8/20G06F16/9535G06F16/954
CPCG06F8/24
Inventor 邢航李森汪明
Owner USTC SINOVATE SOFTWARE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products