Implementation method for directional crawler based on assigned e-commerce website

A technology of an e-commerce website and an implementation method, which is applied in the implementation field of directional crawler, and can solve problems such as users not feeling it.

Inactive Publication Date: 2014-09-17
HUAIYIN INSTITUTE OF TECHNOLOGY
View PDF5 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If the system has only one CPU, then true "simultaneity" is impossible, but because the CPU switching speed is very fast, the user can't feel the difference, so the user feels that the threads are executed at the same time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Implementation method for directional crawler based on assigned e-commerce website

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The technical scheme of the present invention is described in detail below in conjunction with accompanying drawing:

[0047] as attached figure 1 Shown, the embodiment of the present invention carries out according to the following steps:

[0048] Step 1. Initialize the data set X of the entry URL of the product list in the specified e-commerce website 1 ={A 1 , A 2 ,...,A n};

[0049] Step 2. Initialize the dataset X 1 add to task queue;

[0050] Step 3, open the thread pool P;

[0051] Step 4. Detect whether there is a task in the task queue. If it exists, reset the timer and perform step 5. Otherwise, perform timer timing and perform step 18;

[0052] Step 5. Check whether the thread pool has an empty child thread available. If not, execute step 4. If yes, the thread pool P takes the task from the queue and hands it to the child thread P. n , start the child thread, go to step 6;

[0053] Step 6, use the judgment template to A i The URL in the domain is j...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an implementation method for a directional crawler based on an assigned e-commerce website, belongs to the field of WEB data collection, and aims at improving the analysis efficiency and the crawling accuracy rate of the crawler, reducing the crawler failure rate caused by change of website content, and increasing the readability and robustness of codes; on the basis of a generalized crawler, the sequence of tasks is managed by utilizing a queue, multi-thread website content analysis is realized by using a thread pool management mechanism, so that the crawling efficiency is improved. Python is used as an implementation language, information of an assigned web page is captured by using a method of combining a CSS (Cascading style sheets) selector and a Regular Expression, the analysis efficiency, the readability and the error-tolerant rate of the crawler are greatly improved, thus the focused crawler specially used for analyzing store commodity information of the assigned e-commerce website is formed,the efficiency and the crawling accuracy rate of the crawler are improved by the method, and the adaptability and the robustness of the crawler are improved. The method provides a stable and convenient data source for e-commerce price analysis.

Description

technical field [0001] The invention belongs to the field of WEB data collection, and in particular relates to a realization method of a directional crawler based on a specified e-commerce website, which can be applied to the collection of specific electronic product information on the specified e-commerce website. Background technique [0002] The web crawler based on designated e-commerce websites is different from crawlers in a broad sense. It is a focused crawler that analyzes and collects commodity information data for designated e-commerce websites. With the rapid growth of network information, web crawler technology is faced with many problems such as the expansion of index scale, faster update speed and more personalized needs. To solve these problems, web crawlers for specific topics and personalized search topics have emerged. In a broad sense, crawlers mostly refer to general-purpose crawlers. General-purpose crawlers do not pay attention to the order of page col...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/48
Inventor 朱全银周泓李翔潘禄刘文儒戎圣吉张宇洋曹苏群王留洋周蕾
Owner HUAIYIN INSTITUTE OF TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products