Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

System, apparatus, and method for user tunable and selectable searching of a database using a weigthted quantized feature vector

a feature vector and database technology, applied in the field of data processing, can solve the problems of difficult to find exact matches of atom-by-atom and bond-by-bond search, affecting the accuracy of database search results, and consuming a lot of time for exact match searching

Inactive Publication Date: 2004-01-08
ROW2 TECH
View PDF4 Cites 58 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0026] The invention disclosed herein is a data processing product and method that permits computerized similarity searching of an electronic database using a quantization vector. The quantization vector, a linear array of descriptive properties of the entries in the database, is maintained by the system. Different datatype representations of the quantization vector may be implemented. The system examines the structure of a query item in terms of its known descriptive properties. During examination, the quantization vector is established. This vector represents the query item's "fingerprint." The system then searches the entire database for identity or similarity to the query item by comparing the vectors. The system further permits the user to set numeric priorities for the descriptive properties in a user friendly environment, said priorities to be used in the search for entries that are similar to the query item. An object of the invention is to provide a simplified searching system for naive and infrequent users. In one of the embodiments presented herein, a computerized user tunable system is disclosed that selectively searches a database of chemical compounds. In another embodiment presented herein, a computerized user tunable system is disclosed that selectively searches a database of biological activity screening test results.

Problems solved by technology

An atom-by-atom and bond-by-bond search becomes more difficult as the size of the molecule increases.
Even were the organic molecules to be pre-classified according to specific features, queries to find exact matches of these features might still yield questionable and non-useful results.
Furthermore, in large databases, exact match searching can be extremely time consuming.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System, apparatus, and method for user tunable and selectable searching of a database using a weigthted quantized feature vector
  • System, apparatus, and method for user tunable and selectable searching of a database using a weigthted quantized feature vector
  • System, apparatus, and method for user tunable and selectable searching of a database using a weigthted quantized feature vector

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] It is feasible to perform similarity searches in an electronic database of items, wherein said items possess a set of one or more descriptive properties (related to the items) that can be expressed in numeric form. Similarity searching in such a generalized database according to current technology may be performed in a computer using the method shown in FIG. 1.

[0039] 1. A user submits a query to the system. The query may be submitted using different formats, but a query item must be able to be classified according to its descriptive properties. The descriptive properties may have inherent numeric values (e.g., test results, characteristic values, prices, ASCII values, checksums, etc.). Alternatively, they may have binary values (`one` indicating the presence of a feature and `zero` indicating the absence of the feature).

[0040] 2. The query item is parsed according to its descriptive properties. The descriptive properties are analyzed by comparing various elements of the query...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention disclosed herein concerns a data processing means for user tunable and selectable searching of a database wherein the data contained therein have associated descriptive properties capable of being expressed in numeric form. A quantized vector representative of the descriptive properties is created for each item in the database. This quantized vector becomes the fingerprint for each data item. The user submits a query item to be matched against the database for similarity. A fingerprint is calculated for the query item. The user may then assign weights to the individual descriptive properties based upon perceived importance. A newly weighted fingerprint for the query item is then compared with the weighted fingerprints for all the data in the database. A list of results sorted in order of decreasing similarity is presented to the user. The user may then change the previously assigned weights and then re-run the similarity search. This may be done as often as necessary to achieve the desired results. The invention describes similarity searching in a generic database. However, this invention is particularly desirable in databases containing chemical compound structure data or biological response screening result data. The process described herein may be run stand alone or as a preliminary screening search in a large database. If used for screening, it can greatly reduce the amount of data required for exactly matching a query item to the data in the database.

Description

[0001] This is a U.S. nonprovisional utility patent application that is also described in and claims the benefit of both U.S. provisional patent application Nos. 60 / 383,952 filed on May 29, 2002, entitled MACHINE, METHOD AND ARTICLE OF MANUFACTURE FOR A SELECTIVELY SEARCHING A DATABASE OF CHEMICAL COMPOUNDS, and 60 / 384,305 filed on May 30, 2002, entitled MACHINE, METHOD AND ARTICLE OF MANUFACTURE FOR SEARCHING A DATABASE OF BIOLOGICAL ACTIVITY SCREENING RESULTS, said provisional applications being incorporated by reference in their entirety herein.REFERENCE TO AN APPENDIX[0002] Accompanying this patent application is a CD-R, bearing the electronic title "Gange & Framroze," the contents of which comprise a program listing in ASCII text file format entitled LISTING.TXT, being of size 86 KB and having been created on May 29, 2003. The contents of said CD-R is incorporated by reference herein. The CD-R is hand labeled as follows:[0003] Non-Provisional Patent Application Dr. David M. Gan...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F7/00G06F17/30G06F19/00
CPCG06F17/3053G06F17/30964G06F19/705Y10S707/99931Y10S707/955G06F16/903G06F16/24578G16C20/40
Inventor GANGE, DAVID M.FRAMROZE, BOMI PATEL
Owner ROW2 TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products