Accelerated large-scale similarity calculation

A correlation and computer technology, applied in calculation, complex mathematical operations, instruments, etc., can solve calculation-intensive and time-consuming problems

Pending Publication Date: 2020-04-03
GOOGLE LLC
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This process becomes increasingly computationally intensive and time-consuming as the number of records stored increases

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Accelerated large-scale similarity calculation
  • Accelerated large-scale similarity calculation
  • Accelerated large-scale similarity calculation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] This document describes techniques for implementing a k-minimum hash or k-minimum value ("KMV") data processing algorithm to classify data preloaded at a graphics processing unit (GPU) to compute relationships between entities. Specifically, the described techniques can be used to accelerate data correlation calculations (e.g., for determining similarity between entities) by storing pre-sorted data on the GPU, so that the computing unit of the GPU can quickly determine the similarity between entities. relation. GPUs determine relationships by performing a specific type of correlation algorithm. Since the GPU is no longer required to pre-sort the data before performing the correlation algorithm, relationships can be computed or determined at the GPU at increased speed relative to current systems.

[0025] For example, entity correlation systems store large amounts of data including information describing different entities. The system may include a central processing u...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining data stored at a storage device using a first processor of an entity correlation system. The data includes information about multiple entities. The first processor generates data arrays using the obtained data. Each data array includes parameter values for multiple entities and is configured for processing at a respective computing cell of a second processor. The system provides the data arrays to the second processor. The second processor is configured to execute a correlation algorithm to concurrently process the data arrays at the respective computing cells. The second processor computes a correlation score based on calculations performed at the cells using the algorithm andthe parameter values. The system determines relationships among entities of the data arrays based on the correlation score. The relationships indicate overlapping attributes or similarities that existamong subsets of entities.

Description

Background technique [0001] This specification relates to the calculation process of large-scale similarity calculation. [0002] In many cases, it may be desirable to determine whether, or to what extent, an input sample matches more than one stored record. As one example, it may be desirable to determine whether a DNA sample matches any of the records stored in a database of DNA records. A database may contain many DNA records (eg, hundreds of thousands or even millions of records). In general, it may be desirable to retrieve a certain number (n) of stored records from the database in response to an input sample. The input samples may be the n records in the database that are determined to be the n closest matches to the input samples. The number n of retrieved records is smaller than the total number of records in the database, usually much smaller. The n retrieved records can be arranged in the most probable order first. Conventionally, such a retrieval process may in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/10
CPCG06F17/10G06F17/15G06F17/18G06F16/906G06F12/0802
Inventor 马琳N.威甘德
Owner GOOGLE LLC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products