Improved SimHash code similarity detection method
A detection method and similarity technology, applied in special data processing applications, instruments, software maintenance/management, etc., can solve problems such as poor accuracy, and achieve the effect of improving accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0049] The present invention will be further described below in conjunction with the accompanying drawings.
[0050] refer to figure 2 , an improved SimHash code similarity detection method, comprising the following steps:
[0051] 1) participle
[0052] Given a sentence (article, code), perform word segmentation and feature extraction to obtain effective feature vectors, and then set weights for each feature vector.
[0053] 2) hash
[0054] The hash value of each feature vector is calculated by the hash function, and the hash value is an n-bit signature composed of binary number 01.
[0055] 3) weighted
[0056] On the basis of the hash value, weight all the feature vectors, that is, W=hash*weight, and when 1 is encountered, the hash value is multiplied positively by the weight, and when 0 is encountered, the hash value is multiplied negatively by the weight. Thus the weighted results of each eigenvector are obtained.
[0057] 4) merge
[0058]Accumulate the hash-wei...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com