Unlock instant, AI-driven research and patent intelligence for your innovation.

Method and device for data filtering

A data and Bloom filter technology, applied in the field of data processing, can solve the problem of increasing the probability of misjudgment and achieve the effect of avoiding the increase of misjudgment rate

Active Publication Date: 2020-09-11
BEIJING GRIDSUM TECH CO LTD
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of this, the present invention proposes a method and device for data filtering, the main purpose of which is to solve the problem that the probability of misjudgment increases when a single Bloom filter used to filter data stores too much data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for data filtering
  • Method and device for data filtering
  • Method and device for data filtering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0022] During the use of a single Bloom filter, as the filtered data increases, the storage of the Bloom filter will reach a bottleneck, that is, the probability of misjudgment will increase after the Bloom filter stores too much data. Although a single Bloom filter can be expanded by increasing the storage space, it is impossible to directly expand the capacity of the original filter after a Bloom filter has been used for a period ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data filtering method and device, relating to the field of data processing. Therefore, the possibility of misjudgments because a single bloom filter stores too much data is solved. The method comprises following steps: determining the value domain based on a hash function; calculating the number of to-be-distributed bloom filters based on the value domain of the hash function; uniformly mapping the to-be-distributed bloom filters to the value domain of the hash function; and assigning bloom filters to which to-be-filtered data belong according to the position where the to-be-filtered data is located within the value domain of the hash function. The data filtering method and device are mostly used for making repetitive judgments to a deluge of data.

Description

technical field [0001] The invention relates to the field of data processing, in particular to a data filtering method and device. Background technique [0002] A web crawler is a program that can automatically download website data. It obtains the information needed by developers by downloading all links of a specified website. In some websites, there may be links to the same web page in multiple places. If the web crawler repeatedly crawls the same link, it will not only waste processing resources, but also cause the pollution of data results by storing duplicate data. Therefore, Web crawlers need to record the webpage links that have been crawled. Every time they store a webpage link, they need to check whether the webpage link has appeared in the stored data. However, in this way, the storage space required to avoid repeated storage of data will be Astronomical figures are often difficult to satisfy. As a result, a Bloom filter appears. The Bloom filter can map data in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/9535G06F16/9536
CPCG06F16/9535
Inventor 李可欣
Owner BEIJING GRIDSUM TECH CO LTD