In recent years, algorithm research in the area of recommender systems has shifted from matrix factorization techniques and their latent factor models to neural approaches. However, given the proven power of latent factor models, some newer neural approaches incorporate them within more complex network architectures. One specific idea, recently put forward by several researchers, is to consider potential correlations between the latent factors, i.e., embeddings, by applying convolutions over the user-item interaction map. However, contrary to what is claimed in these articles, such interaction maps do not share the properties of images where Convolutional Neural Networks (CNNs) are particularly useful. In this work, we show through analytical considerations and empirical evaluations that the claimed gains reported in the literature cannot be attributed to the ability of CNNs to model embedding correlations, as argued in the original papers. Moreover, additional performance evaluations show that all of the examined recent CNN-based models are outperformed by existing non-neural machine learning techniques or traditional nearest-neighbor approaches. On a more general level, our work points to major methodological issues in recommender systems research.