Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

32 results about "MinHash" patented technology

In computer science and data mining, MinHash (or the min-wise independent permutations locality sensitive hashing scheme) is a technique for quickly estimating how similar two sets are. The scheme was invented by Andrei Broder (1997), and initially used in the AltaVista search engine to detect duplicate web pages and eliminate them from search results. It has also been applied in large-scale clustering problems, such as clustering documents by the similarity of their sets of words.

Method and device for obtaining similar object set and providing similar object set

The invention discloses a method and device for obtaining similar object set and providing similar object set. The method comprises as follows: obtaining input file comprising M objects, N attributes, attribute values corresponding to each attribute; inputting each attribute to first level of pre-created minimum hash function minhash, obtaining the returned value of the first level of minhash of each attribute; according to each attribute, weighted value corresponding to the attribute in the current object and the second level of pre-created minhash function, obtaining the returned value of the second level of the minhash of each attribute; calculating the combined minhash value of each attribute in each object respectively; determining the minimum value of the combined minhash value corresponding to each attribute of the same object as the minhash value of the object; circularly executing the operation to each object for K times, respectively obtaining K minhash values in allusion to each object; inputting K minhash values of each object to the locality sensitive hashing (LSH) computing framework. The method and device are capable of improving the operating efficiency, and improving the validity and accuracy degree of the similar object information.
Owner:ALIBABA GRP HLDG LTD

Method and system for identifying homologous binary files

The invention provides a method and a system for identifying homologous binary files in a database. The database comprises multiple binary basic files. The method comprises the steps of obtaining signatures of to-be-identified files and signatures of the basic files according to a min-hash algorithm; for any signature, performing bucket dividing processing on the signature according to a bucket dividing method; according to a reverse indexing method and the signatures, subjected to bucket dividing, of all the basic files, obtaining dictionaries in one-to-one correspondence with buckets, wherein each dictionary comprises at least one key value pair; according to character strings in the buckets of the to-be-identified files, traversing the corresponding dictionaries, and according to valuescorresponding to matching keys, obtaining the homologous binary files of the to-be-identified files. According to the method and the system, the signatures are obtained by adopting the min-hash algorithm and the bucket dividing is performed by adopting a local sensitive hash algorithm, so that the calculation amount can be remarkably reduced; and by adopting the reverse indexing method, an indextable is established for all the signatures, so that the speed of identifying the homologous binary files is increased.
Owner:INST OF INFORMATION ENG CAS

Intelligent recommendation method and device, computer equipment and readable storage medium

The invention relates to the technical field of big data, and discloses an intelligent recommendation method and device, computer equipment and a readable storage medium, and the method comprises thesteps: obtaining user information, and carrying out the characterization of the user information to obtain a user vector; calling a product quantization process to segment the user vector to obtain aplurality of sub-vectors, identifying the category to which each sub-vector belongs, and summarizing the categories to obtain a user category set; calling a minimum hash process to perform similaritycomparison on the user category set and each reference category set in a preset index library, and setting the reference category set of which the similarity exceeds a preset similarity threshold as atarget category set; and taking the associated information corresponding to the target category set as recommendation information. According to the method, the fineness and the accuracy of user vector category identification are improved, the operation efficiency of the server is improved, the matching speed between the user information and the reference information in the index database is increased, and the data calculation amount and the data storage amount are reduced.
Owner:CHINA PING AN LIFE INSURANCE CO LTD

Combination optimizing method based on Lucene index section

The invention relates to a combination optimizing method based on a Lucene index section, and belongs to the technical field of the computer index. The method comprises the following steps: combiningcurrent node load information and section information of index, building a combination analyzing module to judge whether to meet a combination condition or not; according to a dictionary file contained in each index section, to obtain a characteristic matrix in the index with respect to an index section, processing by combining a minHash algorithm and a minimum hash signature algorithm, so as to calculate the signature matrix of the index section; through combining the signature matrix of the index section and a Jaccard similarity principle, calculating a similarity coefficient between the index sections, and according to the similarity coefficient, dividing the index sections into different similar sets; and using a similarity evaluation model to grade each similar set, and sorting according to a set score, selecting one or more sets with the highest score to be combined by a combination thread. The optimizing method is capable of reducing the effect of combination operation to performance of an index function and a search function and effectively improving a search speed.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

A Merge Optimization Method Based on Lucene Index Segment

The invention relates to a method for merging and optimizing based on Lucene index segments, and belongs to the technical field of computer indexing. It includes the following steps: combining the load information of the current node and the segment information of the index, constructing a merge analysis module to judge whether the merge condition is satisfied. According to the dictionary files contained in each index segment, the feature matrix of the index segment in the index is obtained, and then combined with the minHash algorithm and the minimum hash signature algorithm to calculate the signature matrix of the index segment. Combined with the signature matrix of the index segment and the Jaccard similarity principle, the similarity coefficient between each index segment is calculated, and the index segment is divided into different similar sets according to the similarity coefficient. Use the similarity evaluation model to score each similar set, and sort according to the set score, and select one or more sets with the highest score to be merged by the merge thread. The optimization method of the invention can reduce the impact of the merge operation on the performance of the index function and the retrieval function and can effectively improve the speed of retrieval.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Method and device for obtaining similar object collection and providing similar object information

The invention discloses a method and device for obtaining similar object set and providing similar object set. The method comprises as follows: obtaining input file comprising M objects, N attributes, attribute values corresponding to each attribute; inputting each attribute to first level of pre-created minimum hash function minhash, obtaining the returned value of the first level of minhash of each attribute; according to each attribute, weighted value corresponding to the attribute in the current object and the second level of pre-created minhash function, obtaining the returned value of the second level of the minhash of each attribute; calculating the combined minhash value of each attribute in each object respectively; determining the minimum value of the combined minhash value corresponding to each attribute of the same object as the minhash value of the object; circularly executing the operation to each object for K times, respectively obtaining K minhash values in allusion to each object; inputting K minhash values of each object to the locality sensitive hashing (LSH) computing framework. The method and device are capable of improving the operating efficiency, and improving the validity and accuracy degree of the similar object information.
Owner:ALIBABA GRP HLDG LTD

Intelligent agent behavior responsibility investigation method based on social network privacy negotiation system

The invention discloses an intelligent agent behavior responsibility investigation method based on social network privacy negotiation system, which realizes agent behavior responsibility investigation through qualitative responsibility investigation and quantitative responsibility investigation processes, and adopts a forward simulation negotiation process and a reverse reproduction negotiation process in the qualitative responsibility investigation process. And whether the privacy negotiation agent has improper behaviors or not is accurately judged, and the specific occurrence position of the privacy negotiation agent is accurately locked when the improper behaviors exist. Three quantitative responsibility investigation methods including a simple quantification method, a weighted mahalanobis distance method and an improved Minhash method are further provided, the responsibility quantification value of the privacy negotiation agent can be obtained, and therefore the severity degree of improper behaviors is quantified. According to the invention, the problems of untrusted, unsafe and malicious behaviors of the intelligent agent in the current social network privacy negotiation system are solved.
Owner:JINAN UNIVERSITY

A method and system for identifying homologous binary files

The invention provides a method and a system for identifying homologous binary files in a database. The database comprises multiple binary basic files. The method comprises the steps of obtaining signatures of to-be-identified files and signatures of the basic files according to a min-hash algorithm; for any signature, performing bucket dividing processing on the signature according to a bucket dividing method; according to a reverse indexing method and the signatures, subjected to bucket dividing, of all the basic files, obtaining dictionaries in one-to-one correspondence with buckets, wherein each dictionary comprises at least one key value pair; according to character strings in the buckets of the to-be-identified files, traversing the corresponding dictionaries, and according to valuescorresponding to matching keys, obtaining the homologous binary files of the to-be-identified files. According to the method and the system, the signatures are obtained by adopting the min-hash algorithm and the bucket dividing is performed by adopting a local sensitive hash algorithm, so that the calculation amount can be remarkably reduced; and by adopting the reverse indexing method, an indextable is established for all the signatures, so that the speed of identifying the homologous binary files is increased.
Owner:INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products