A universal data-mining platform capable of analyzing 
mass spectrometry (MS) serum proteomic profiles and / or 
gene array data to produce biologically meaningful classification; i.e., group together biologically related specimens into clades. This platform utilizes the principles of phylogenetics, such as parsimony, to reveal susceptibility to 
cancer development (or other physiological or pathophysiological conditions), diagnosis and 
typing of 
cancer, identifying stages of 
cancer, as well as post-
treatment evaluation. To place specimens into their corresponding 
clade(s), the invention utilizes two algorithms: a new data-mining 
parsing algorithm, and a publicly available phylogenetic 
algorithm (MIX). By outgroup comparison (i.e., using a normal set as the standard reference), the 
parsing algorithm identifies under and / or overexpressed 
gene values or in the case of sera, (i) novel or (ii) vanished MS peaks, and peaks signifying (iii) up or (iv) down regulated proteins, and scores the variations as either derived (do not exit in the outgroup set) or ancestral (exist in the outgroup set); the derived is given a 
score of “1”, and the ancestral a 
score of “0”—these are called the polarized values. Furthermore, the shared derived characters that it identifies are 
potential biomarkers for cancers and other conditions and their subclasses.