Towards a perceptual distance metric for auditory stimuli


Abstract in English

Although perceptual (dis)similarity between sensory stimuli seems akin to distance, measuring the Euclidean distance between vector representations of auditory stimuli is a poor estimator of subjective dissimilarity. In hearing, nonlinear response patterns, interactions between stimulus components, temporal effects, and top-down modulation transform the information contained in incoming frequency-domain stimuli in a way that seems to preserve some notion of distance, but not that of familiar Euclidean space. This work proposes that transformations applied to auditory stimuli during hearing can be modeled as a function mapping stimulus points to their representations in a perceptual space, inducing a Riemannian distance metric. A dataset was collected in a subjective listening experiment, the results of which were used to explore approaches (biologically inspired, data-driven, and combinations thereof) to approximating the perceptual map. Each of the proposed measures achieved comparable or stronger correlations with subjective ratings (r ~ 0.8) compared to state-of-the-art audio quality measures.

Download