Balance clustering compression method based on data similarity
A clustering compression and data similarity technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of insufficient execution efficiency, uneven, data-dependent system load, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0055] Such as figure 1 As shown, the implementation steps of the present invention are as follows:
[0056] 1. File feature vector extraction:
[0057] The feature vector is extracted from the document data to calculate the document similarity. The specific implementation steps are as follows:
[0058] 1) Choose an independent permutation function (h 1 , H 2 ,..., h k }, each permutation function is independent of each other, here an independent linear function is used, namely h i =a i x+b i mod p, where a i , B i , Is a randomly generated integer;
[0059] 2) Scan the input file f byte by byte from front to back, use the efficient Rabin fingerprint function to calculate the fingerprint of the data in the current sliding window, record the fingerprint as fp, and use the k independent permutation functions mentioned above to act on the fingerprint fp to obtain k Replace fingerprint h 1 (fp), h 2 (fp), …, h k (fp), record the feature vector F(f) of file f as {F 1 (f), F 2 (f),......
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com