Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Molecular property modeling using ranking

a molecular property and ranking algorithm technology, applied in the field of machine learning techniques, can solve the problems of inconsistent values of properties, ineffective training of molecular properties models with useful predictive power, and ineffective use of training data in either form

Active Publication Date: 2010-04-20
VALO HEALTH INC
View PDF13 Cites 17 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

This approach allows for the creation of accurate molecular properties models that effectively order molecules based on a property of interest, improving predictive power and handling data bias, thereby enhancing the reliability of molecular property predictions.

Problems solved by technology

Using training data in either form has often, however, proved to be ineffective in training molecular properties models with a useful degree of predictive power.
This may occur due to problems with the quality of the training data.
These differences often lead to inconsistent values for the property of interest being reported for the same molecule.
Additionally, even measurements obtained under “identical” experimental conditions may have enough experimental uncertainty or noise that it becomes unreasonable to assign a precise numerical value to the property of interest.
For example, this is commonly encountered in molecular modeling calculations where the ranking of molecules based on the calculation of absolute binding energies can be less accurate than the ranking of compounds based on relative calculated binding energies.
Training examples that use a label asserting the presence or absence of the property of interest have also proven to be of limited value in training a molecular properties model.
Oftentimes, such data has a large bias in that the data is predominantly of one label.
This model, however, is not particularly useful, as it makes the same prediction for every molecule.
Generally, models built from data will not predict the property of interest with perfect accuracy for all molecules, and there will be some errors.
These types of errors have different costs, (e.g., in a diamond mine it is far more expensive to falsely predict that a diamond is dirt than it is to predict that dirt is a diamond).
In biological and pharmaceutical applications, however, it can be very difficult to assign relative values to false positives and false negatives and so it becomes very difficult to trade them off.
Existing molecular property modeling techniques, however, are not capable of using such ordering information, nor are they capable of dealing with bias in the data or of constructing reasonable models without knowing the optimal trade-off between false positives and false negatives.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Molecular property modeling using ranking
  • Molecular property modeling using ranking
  • Molecular property modeling using ranking

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027]Embodiments of the invention provide novel techniques for modeling molecular properties. Specifically, embodiments of the invention provide novel techniques for training molecular properties models that order sets of molecules relative to a property of interest. Embodiments of the invention generally train a molecular properties model in one of four ways:[0028](i) Embodiments of the invention provide novel techniques for generating ranked training data used to train a molecular properties model. Particular embodiments of the invention may be used to generate ranked data from data that is not provided in a ranked form.[0029](ii) Embodiments of the invention provide techniques that train a molecular properties model using training examples based on ranked data. Embodiments of the invention generate training examples based on ranked data that may be used by a learning algorithm that is not configured to process ranked data.[0030](iii) Embodiments of the invention provide novel te...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

PropertyMeasurementUnit
bond stabilityaaaaaaaaaa
melting pointaaaaaaaaaa
force-field parametersaaaaaaaaaa
Login to View More

Abstract

Methods and articles of manufacture for modeling molecular properties using data regarding the partial orderings of compound properties, or by considering measurements of compound properties in terms of partial orderings are disclosed. One embodiment provides for constructing such partial orderings from data that is not already in an ordered form by processing training data to produce a partial ordering of the compounds with respect to a property of interest. Another embodiment of the invention may process the modified training data to construct a model that predicts the property of interest for arbitrary compounds.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS[0001]This application claims priority to U.S. provisional patent application Ser. No. 60 / 584,819, filed Jun. 29, 2004, and to U.S. Provisional patent application Ser. No. 60 / 584,820, filed Jun. 29, 2004, both of which are incorporated by reference herein in their entirety.[0002]This application is also related to the following: (1) U.S. Pat. No. 6,571,226, Issued May 27, 2003, (2) U.S. patent application Ser. No. 11 / 074,587, filed on Mar. 8, 2005, (3) U.S. patent application Ser. No. 10 / 449,948, filed on May 30, 2003, now abandoned; (4) U.S. patent application Ser. No. 10 / 452,481, filed on May 30, 2003, and (5) U.S. patent application Ser. No. 11 / 172,216, filed on even date herewith entitled “Estimating the Accuracy of Molecular Properties Models and Predictions”, Now U.S. Pat. No. 7,194,359. Each of the aforementioned patent and patent applications are incorporated by reference herein in their entirety.BACKGROUND OF THE INVENTION[0003]1. Fiel...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(United States)
IPC IPC(8): G01N33/48G01N31/00G01N33/50G01N33/68G06F19/00
CPCG06F19/707G06F19/704G16C20/30G16C20/70
Inventor DUFFY, NIGEL P.
Owner VALO HEALTH INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products