Method for web crawler URL (uniform resource locator) deduplicating based on DSBF (dynamic splitting Bloom Filter)
A web crawler and dynamic technology, applied in the Internet field, can solve the problems of deduplication and difficulty in adaptation, and achieve the effects of efficient collection and processing, excellent time efficiency and space efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0024] Below in conjunction with specific embodiment, further illustrate the present invention, should be understood that these embodiments are only used to illustrate the present invention and are not intended to limit the scope of the present invention, after having read the present invention, those skilled in the art will understand various equivalent forms of the present invention All modifications fall within the scope defined by the appended claims of the present application.
[0025] A web crawler URL deduplication method based on dynamically splittable Bloom Filter, including:
[0026] (1) Firstly, a dynamically splittable Bloom Filter must be constructed, and the binary array of each leaf Bloom Filter is stored in the Redis database. Redis is an in-memory database with excellent read and write performance, but its performance will drop sharply when the stored content approaches or exceeds the memory size. Therefore, according to the scale and characteristics of the w...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 