The invention discloses a distributed spider
system. The
system is configured to comprise three parts, namely a distributed service based on ZooKeeper, a
system component and a
database, wherein the system component comprises a
system monitoring component Monitor, a coordination component Coordinator, a log collection component Logger and a basic spider component Spider, the
database comprises a Redis memory
database, redis is a key-value storage form, and a distributed URL task
queue and a distributed BloomFilter are stored in the Redis memory database. The invention furthermore discloses a periodical increment capture method based on the system. The method comprises the steps that the coordination component Coordinator periodically imports tasks to the distributed URL task
queue and awakens the Spider component in
dormancy; and the Spider component performs
dormancy or periodical increment capture according to execution of the current distributed URL task
queue. Through the system and the method, stand-alone spiders are effectively combined, distributed spiders with
high availability, high stability and a high
throughput rate in a cluster environment are obtained, and periodical increment capture is realized.