The historical and geographical spread from older to more modern languages has long been studied by examining textual changes and in terms of changes in phonetic transcriptions. However, it is more difficult to analyze language change from an acoustic point of view, although this is usually the dominant mode of transmission. We propose a novel analysis approach for acoustic phonetic data, where the aim will be to statistically model the acoustic properties of spoken words. We explore phonetic variation and change using a time-frequency representation, namely the log-spectrograms of speech recordings. We identify time and frequency covariance functions as a feature of the language; in contrast, mean spectrograms depend mostly on the particular word that has been uttered. We build models for the mean and covariances (taking into account the restrictions placed on the statistical analysis of such objects) and use these to define a phonetic transformation that models how an individual speaker would sound in a different language, allowing the exploration of phonetic differences between languages. Finally, we map back these transformations to the domain of sound recordings, allowing us to listen to the output of the statistical analysis. The proposed approach is demonstrated using recordings of the words corresponding to the numbers from one to ten as pronounced by speakers from five different Romance languages.