Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Big data searching method based on sparse hash

A technology of big data and hash function, which is applied in the field of computer science and technology and information field, can solve the problems of low accuracy rate and high complexity of big data retrieval, and achieve the effect of low storage capacity and space saving

Active Publication Date: 2014-02-26
GUANGXI NORMAL UNIV
View PDF1 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This method can solve the problem of high complexity and low accuracy of big data retrieval, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Big data searching method based on sparse hash
  • Big data searching method based on sparse hash
  • Big data searching method based on sparse hash

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] Randomly intercept 70,000 animal pictures from the Internet, assuming that each picture requires 1M storage space (note that this picture is not a very fidelity picture), then the entire data set needs 70G space storage. The present invention replaces each picture with a 4-bit binary code, and only needs about 3.5K storage space in total. This saves nearly 20,000 times compared to the original storage.

[0029] (1) Because a common 4G memory computer can handle 100,000 instances of the algorithm of the present invention. Therefore, for this data set, the present invention does not need sampling, and directly uses 70,000 data sets for training to obtain a hash function. And you end up with a 4-bit binary representation for each instance.

[0030] (2) For each test instance, the present invention first obtains its low-dimensional real value representation: 0.4, 0, 0.1, 0.7, (see figure 1 ).

[0031] This representation: 1) is reduced from the 784 dimensions of the ori...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a big data similar searching method, in particular to a big data searching method based on sparse hash. The method is mainly used for conducting application development based on storage and searching of big data. The method includes first utilizing a sampling method to determine the size of a training set according to theory of a computer memory, learning the training set and learning a hash function for big data coding and binary coding of the training set, then conducting binary coding on the big data according to the learnt hash function. At the moment, online search application can be conducted, namely for one test case, first a binary code of the test case is obtained according to the obtained hash function, and then real-time search is conducted on the binary code of the big data. By means of the method, the big data searching time complexity is linear, the problem that manifold learning does not have an explicit function is solved, storage quantity of the big data is reduced to thousands of times, the method is easy to implement, and only some simple mathematical models are involved during code writing.

Description

technical field [0001] The present invention relates to the field of computer science and technology and the field of information technology, in particular to big data, in particular to a method for retrieving big data such as pictures, texts, and music using sparse hashing. Background technique [0002] Big data refers to data sets that cannot be retrieved and managed using conventional tools under current conditions. Large data volume, various data types, low value density and fast processing speed are four very significant characteristics of big data. At present, the research on big data knowledge discovery mainly focuses on four aspects: division, clustering, retrieval, and incremental (batch, online or parallel) learning. [0003] At present, there are relatively few studies on the processing of big data retrieval problems. When searching, users usually hope to quickly get what they need from all the materials. This involves a question of how to choose speed and accu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2255G06F16/24558
Inventor 朱晓峰张师超刘星毅
Owner GUANGXI NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products