We present a model to explain the mass segregation and shallow mass functions observed in the central parts of dense and young starburst stellar clusters. The model assumes that the initial pre-stellar cores mass function resulting from the turbulent fragmentation of the proto-cluster cloud is significantly altered by the cores coalescence before they collapse to form stars. With appropriate, yet realistic parameters, this model based on the competition between cores coalescence and collapse reproduces the mass spectra of the well studied Arches cluster. Namely, the slopes at the intermediate and high mass ends are reproduced, as well as the peculiar bump observed at 6 M_sol. This coalescence-collapse process occurs on short timescale of the order of one fourth the free fall time of the proto-cluster cloud (i.e., a few 10^{4} years), suggesting that mass segregation in Arches and similar clusters is primordial. The best fitting model implies the total mass of the Arches cluster is 1.45 10^{5} M_sol, which is slightly higher than the often quoted, but completeness affected, observational value of a few 10^{4} M_sol. The derived star formation efficiency is ~30 percent which implies that the Arches cluster is likely to be gravitationally bound.