Check patentability & draft patents in minutes with Patsnap Eureka AI!

Mass-scale mail address matching method

An email address and matching method technology, applied in the field of Wu-Manber multi-pattern matching, can solve the problems of hash collision, memory consumption to be optimized, serious time-consuming full-text matching, etc., to achieve the effect of improving performance, good time and space performance

Inactive Publication Date: 2018-11-06
HARBIN ENG UNIV
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] 1) The method is optimized for the characteristics of the email address, and the memory consumption of the method needs to be optimized;
[0005] 2) In massive scale matching scenarios, hash collisions are serious
[0006] 3) Full-text matching takes a lot of time in massive scale matching scenarios
The pattern strings that generate hash conflicts are stored in the linked list. This is the traditional processing method. When the full text is accurately matched, it is necessary to traverse the linked list one by one to check whether the match is successful. Although the PREFIX table can reduce the number of exact matches and improve the matching Efficiency, but this sequential search method, for a large collection of pattern strings, the performance improvement effect of the PREFIX table is not obvious

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mass-scale mail address matching method
  • Mass-scale mail address matching method
  • Mass-scale mail address matching method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The following examples describe the present invention in more detail.

[0029] Test data preparation:

[0030] Crawled 2,622 email addresses through the Internet and obtained 81 domain names. The email service provider requires the length of the email address user name to be 3-18 characters long, and the length of the email address after adding the domain name is within the range of 10-36 characters. Using those 81 domain names, email addresses with a length ranging from 10 to 36 characters were randomly generated. The randomly generated email address pattern sets mainly have four sizes, 1 million, 2 million, 5 million, and 10 million. It is impossible for email addresses in practical applications to be of fixed length. The captured 2622 email addresses will be counted according to their length ratios, and then a set of pattern strings will be generated according to their ratios. Through these test sets, the performance differences between the present invention and th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a mass-scale mail address matching method. A Hash model with the excellent performances in mass-scale Hash time consumption and Hash conflict processing is selected as a Hash function of the matching method. Because the repetition probability of the domain names of mail addresses is high, a Bloom filter is used for the storing and matching of the mail addresses, so as to reduce the memory consumption of the method and improve the matching efficiency of the method. A red-black tree is used for string a mode string after the Hash conflict, thereby improving the full textmatching performance of the algorithm. The method achieves the optimization of a storage structure and a matching flow of a WM matching method. A high-efficiency Hash model (BKDRHash) is employed forreducing the Hash conflict. The Bloom filter is used for the storing and matching of the domain names of the mail addresses, thereby avoiding the repeated storage of the domain names of the mail addresses, and reducing the memory consumption of the matching method. The red-black tree is used for processing the elements generating Hash conflicts, thereby reducing the full-text matching time consumption.

Description

technical field [0001] The invention relates to a Wu-Manber multi-pattern matching method. Background technique [0002] Wu-Manber algorithm based on hash filtering idea, referred to as WM algorithm. The WM algorithm combines the "bad character" idea of ​​the BM algorithm, and is a typical algorithm based on suffix scanning, but in practical applications, the WM algorithm uses character blocks. This means that after the matching fails, the jumping distance of the matching pointer becomes larger, which improves the matching performance of the algorithm. When the WM algorithm is matching, it uses a hash table to select a subset of the pattern string set to match the full text of the current text, reducing unnecessary matching operations. The optimal time complexity of WM algorithm can reach O(B n / m) (B is the length of the character block, m is the shortest pattern length). The execution time of the WM algorithm will not increase proportionally with the increase of the patt...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06Q10/10G06F17/30
CPCG06Q10/107
Inventor 玄世昌苘大鹏王巍杨武赵恒
Owner HARBIN ENG UNIV
Features
  • R&D
  • Intellectual Property
  • Life Sciences
  • Materials
  • Tech Scout
Why Patsnap Eureka
  • Unparalleled Data Quality
  • Higher Quality Content
  • 60% Fewer Hallucinations
Social media
Patsnap Eureka Blog
Learn More