A method for training a classifier for selecting features in sparse data sets with high feature dimensionality includes providing a set of data items x and labels y, minimizing a functional of the data items x and associated labels yL(w,b,a,c,γ1,γ2):=1N∑i=1Nai+λ1c1+λ22w22+γ1T(e-Y(Xw+be)-a)+γ2T(w-c)+μ12e-Y(Xw+be)-a22+μ22w-c22to solve for hyperplane w and offset b of a classifier by successively iteratively approximating w and b, auxiliary variables a and c, and multiplier vectors γ1 and γ2, wherein λ1, λ2, μ1, and μ2 are predetermined constants, e is a unit vector, and X and Y are respective matrix representations of the data items x and labels y; providing non-zero elements of the hyperplane vector w and corresponding components of X and Y as arguments to an interior point method solver to solve for hyperplane vector w and offset b, wherein w and b define a classifier than can associate each data item x with the correct label y.