Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

124 results about "Locality-sensitive hashing" patented technology

In computer science, locality-sensitive hashing (LSH) is an algorithmic technique that hashes similar input items into the same "buckets" with high probability. (The number of buckets are much smaller than the universe of possible input items.) Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items.

Content aggregation method based on distributed web crawlers

The invention provides a content aggregation method based on distributed web crawlers, which comprises the steps that firstly different crawler platforms are arranged at different devices, a request is sent to a crawling network information source end, and the crawler platforms fabricate crawling rules according to target information required by a user and crawl information in which the target user is interested; the crawled network information is processed, similarity detection is carried out based on a data transmission and conversion method in a real-time database and by being combined with a locality sensitive hashing (LSH) method so as to reduce the redundancy of the information; and the information is classified and sorted by the system according to the category, the heat and keywords and then displayed on user equipment. According to the method provided by the invention, LSH and similarity comparison are carried out according to the data information acquired in an actual network so as to acquire a comparison result. Compared with a comparison result acquired by adopting a traditional mode of whole data duplication checking in the prior art, the content aggregation method is higher in calculation speed and more accurate in similarity comparison.
Owner:江苏未来网络集团有限公司

LSH (Locality Sensitive Hashing)-based clustering and indexing method and LSH-based clustering and indexing system

ActiveCN103631928AEvenly distributed dataReduce matching performance instabilityRelational databasesSpecial data processing applicationsData setRelative stability
The invention relates to an LSH (Locality Sensitive Hashing)-based clustering and indexing method and an LSH-based clustering and indexing system. The LSH-based clustering and indexing method comprises the steps of step 1, carrying out clustering analysis on a data set, dividing the data set into a plurality of categories, and determining and ensuring a clustering center of each category; step 2, establishing a hashing table in each category by adopting an LSH method; step 3, calculating Euclidean distance between each clustering center and a query point, and selecting multiple categories in minimum Euclidean distances as candidate categories; step 4, calculating a hashing value of the query point in each candidate category, and selecting data points of which the hashing values are the same as that of the query point in the candidate categories as candidate points according to the hashing table established in step 2; step 5, calculating the Euclidean distances between the candidate points and the query point, and taking the candidate point in minimum Euclidean distance as a nearest adjacent point to the query point. According to the LSH-based clustering and indexing method and the LSH-based clustering and indexing system, disclosed by the invention, great increasing of query efficiency and relative stability of query performance can be obtained under the situation of less sacrificing the accuracy rate.
Owner:INST OF INFORMATION ENG CHINESE ACAD OF SCI

Locality-sensitive-hashing-based high-dimensional indexing method for large-scale multimedia data

The invention relates to a locality-sensitive-hashing-based high-dimensional indexing method for large-scale multimedia data. The method includes the following steps of extracting high-dimensional features of the multimedia data at the offline indexing stage; establishing an internal storage index, storing the multimedia high-dimensional features in a feature storage area, calculating the locality sensitive hashing vectors of the high-dimensional features, and storing feature numbers and the locality sensitive hashing vectors corresponding to the features in a hashing list storage area, wherein the internal storage index comprises the feature storage area and the hashing list storage area; establishing a first-stage disk index, wherein the first-stage disk index comprises a feature storage area, an index storage area and a plurality of hashing list storage areas; establishing a second-stage disk index, wherein the second-stage disk index comprises a hashing barrel storage area; repeatedly executing the steps mentioned above till all multimedia input is indexed. At the online query stage, features of the multimedia data used for queries are extracted, the queries are conducted on the basis of the established indexes, and similar query results are returned. By means of the method, the scheduling performance of internal storage and disks is improved, and the indexing speed and the retrieving speed of the multimedia data are increased.
Owner:PEKING UNIV

Efficient distributed locality sensitive Hashing method

The invention provides a distributed locality sensitive Hashing method. The method comprises the steps that original data is loaded from a distributed file system, an original data vector set is read, and a first elastic distributed dataset is generated; L composite Hash functions are constructed according to the number L of Hash tables and the number k of Hash functions designed by a user; L Hash values of each piece of data in the dataset are calculated, each piece of data is mapped into one Hash bucket of each Hash table, key value pairs composed of Hash table identifiers in all the data and values of the composite Hash functions are merged into a string, the string is mapped into digital key values, the digital key values and data identifiers form key value pairs, and the key value pairs are saved as a second elastic distributed dataset; and repartitioning is performed according to the digital key value of each piece of data in the second dataset, so that data with the same digital key value is saved in the same partition, and construction of the Hash tables is completed. Through the method, the shuffle amount generated in the Hash table construction process can be reduced, index construction efficiency can be improved, and message transmission overhead can be reduced during query.
Owner:NAT UNIV OF DEFENSE TECH

Multi-feature locality sensitive hashing (LSH) indexing combination-based remote sensing image retrieval method

The invention discloses a multi-feature locality sensitive hashing (LSH) indexing combination-based remote sensing image retrieval method and belongs to the technical field of remote sensing image retrieval. According to the multi-feature LSH indexing combination-based remote sensing image retrieval method disclosed by the invention, LSH indexing of one of the best indexing technologies in high-dimensional feature spaces is introduced into the field of the remote sensing image retrieval, so that the problems of curse of dimensionality and retrieval time consuming can be effectively solved on a large scale, and the rapid retrieval of remote sensing images is realized. Meanwhile, the invention provides a new indexing validation index-a feature discriminative-ness-based indexing validation index (FDIVI) by aiming at the LSH indexing, and features best capable of distinguishing targets and backgrounds are evaluated and selected by the LSH indexing in all feature spaces, and therefore, the accuracy of a retrieval result is effectively improved. Compared with the prior art, the multi-feature LSH indexing combination-based remote sensing image retrieval method disclosed by the invention is capable of more rapidly and accurately realizing the retrieval of a great amount of remote sensing image data.
Owner:HOHAI UNIV

Method and device for obtaining similar object set and providing similar object set

The invention discloses a method and device for obtaining similar object set and providing similar object set. The method comprises as follows: obtaining input file comprising M objects, N attributes, attribute values corresponding to each attribute; inputting each attribute to first level of pre-created minimum hash function minhash, obtaining the returned value of the first level of minhash of each attribute; according to each attribute, weighted value corresponding to the attribute in the current object and the second level of pre-created minhash function, obtaining the returned value of the second level of the minhash of each attribute; calculating the combined minhash value of each attribute in each object respectively; determining the minimum value of the combined minhash value corresponding to each attribute of the same object as the minhash value of the object; circularly executing the operation to each object for K times, respectively obtaining K minhash values in allusion to each object; inputting K minhash values of each object to the locality sensitive hashing (LSH) computing framework. The method and device are capable of improving the operating efficiency, and improving the validity and accuracy degree of the similar object information.
Owner:ALIBABA GRP HLDG LTD

Method and device for retrieving similarity of picture messages

The invention discloses a method and a device for retrieving the similarity of pictures and belongs to the field of the graphic images. The method comprises the steps of obtaining the to-be-retrieved picture characteristics of to-be-retrieved pictures, hashing the to-be-retrieved picture characteristics by use of an LSH (Locality Sensitive Hashing) algorithm to generate hash values of the to-be-retrieved pictures, finding database picture hash values of which are matched and similar to the hash values of the to-be-retrieved pictures om a hash table corresponding to every hash values of the to-be-retrieved pictures, finding database pictures according to the database picture hash values, and according to Euclidean distances of the characteristics of the database pictures and the characteristics of the to-be-retrieved pictures, selecting a preset number of database pictures from the database pictures having relatively smaller Euclidean distances. The method and the device have the characteristics that the problem that a user cannot retrieve the most similar pictures from a plurality of results obtained by the LSH algorithm at the present is solved, and the advantages of LSH in reducing the time-space complexity in picture similarity retrieval and supporting high-dimensional data retrieval are better developed by virtue of the combination of the LSH algorithm and a linear retrieval algorithm.
Owner:BEIJING FEINNO COMM TECH

K nearest neighbor approximation query method based on multi-layer locality sensitive hashing

The invention belongs to the field of data analysis, and relates to a k nearest neighbor approximation query method based on multi-layer local sensitive hashing. The method comprises the following steps: firstly evaluating the number of data points mapped to each hash bucket, determining an overload hash bucket and an underload hash bucket according to the number of the data points in each hash bucket, then further hashing and dividing the overload hash bucket into a plurality of sub-buckets, and merging the underload hash buckets at the same time; and recursively performing re-hashing on thesub-buckets which are still overloaded after re-division, and balancing the sizes of the plurality of hash buckets as much as possible after multiple times of re-hashing. Therefore, the LSH index structure becomes a multi-layer tree-like structure. According to the method, the initially constructed LSH hash table is reconstructed, so that the kNN search efficiency of the query points in the denseregion and the kNN search accuracy of the query points in the sparse region are improved. The hash buckets in the multi-layer local sensitive hash structure are relatively uniform in size distribution, and the advantages are very obvious when kNN search is carried out on obliquely distributed big data on the hash buckets.
Owner:NORTHEASTERN UNIV LIAONING

Storage method for redundancy deletion block device based on location-sensitive hash

The invention discloses a storage method for a redundancy deletion block device based on location-sensitive hash, and belongs to the data storage field. The method comprises the following steps: putting data blocks of redundant writing detection operation and a corresponding digital finger print into the current operating queue; D: judging whether the number of the data blocks in the queue exceeds threshold value or not, if so, taking threshold value of data blocks as a data section, and executing the step F, and otherwise, executing the step E; E: judging whether the data block at the front of the queue is overtime or not, if so, taking the data blocks as the data section, and executing the step F, and otherwise, executing the step D; F: judging whether the set of metadata of similar data sections exists or not, if so, executing the step G, and otherwise, establishing an empty set, and executing the step G; and G: orderly judging whether digital finger prints of data blocks exist in the set of metadata of the similar data sections or not, if so, modifying the memory addresses of the data blocks, and otherwise, generating the metadata of the data blocks. The method reduces the time of accessing the metadata in the redundant writing detection operation process.
Owner:TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products