We consider fast deterministic algorithms to identify the best linearly independent terms in multivariate mixtures and use them to compute, up to a user-selected accuracy, an equivalent representation with fewer terms. One algorithm employs a pivoted Cholesky decomposition of the Gram matrix constructed from the terms of the mixture to select what we call skeleton terms and the other uses orthogonalization for the same purpose. Importantly, the multivariate mixtures do not have to be a separated representation of a function. Both algorithms require $O(r^2 N + p(d) r N) $ operations, where $N$ is the initial number of terms in the multivariate mixture, $r$ is the number of selected linearly independent terms, and $p(d)$ is the cost of computing the inner product between two terms of a mixture in $d$ variables. For general Gaussian mixtures $p(d) sim d^3$ since we need to diagonalize a $dtimes d$ matrix, whereas for separated representations $p(d) sim d$. Due to conditioning issues, the resulting accuracy is limited to about one half of the available significant digits for both algorithms. We also describe an alternative algorithm that is capable of achieving higher accuracy but is only applicable in low dimensions or to multivariate mixtures in separated form. We describe a number of initial applications of these algorithms to solve partial differential and integral equations and to address several problems in data science. For data science applications in high dimensions,we consider the kernel density estimation (KDE) approach for constructing a probability density function (PDF) of a cloud of points, a far-field kernel summation method and the construction of equivalent sources for non-oscillatory kernels (used in both, computational physics and data science) and, finally, show how to use the new algorithm to produce seeds for subdividing a cloud of points into groups.