We consider best approximation problems in a nonlinear subset $mathcal{M}$ of a Banach space of functions $(mathcal{V},|bullet|)$. The norm is assumed to be a generalization of the $L^2$-norm for which only a weighted Monte Carlo estimate $|bullet|_n$ can be computed. The objective is to obtain an approximation $vinmathcal{M}$ of an unknown function $u in mathcal{V}$ by minimizing the empirical norm $|u-v|_n$. We consider this problem for general nonlinear subsets and establish error bounds for the empirical best approximation error. Our results are based on a restricted isometry property (RIP) which holds in probability and is independent of the nonlinear least squares setting. Several model classes are examined where analytical statements can be made about the RIP and the results are compared to existing sample complexity bounds from the literature. We find that for well-studied model classes our general bound is weaker but exhibits many of the same properties as these specialized bounds. Notably, we demonstrate the advantage of an optimal sampling density (as known for linear spaces) for sets of functions with sparse representations.