In this paper, we consider contention resolution algorithms that are augmented with predictions about the network. We begin by studying the natural setup in which the algorithm is provided a distribution defined over the possible network sizes that predicts the likelihood of each size occurring. The goal is to leverage the predictive power of this distribution to improve on worst-case time complexity bounds. Using a novel connection between contention resolution and information theory, we prove lower bounds on the expected time complexity with respect to the Shannon entropy of the corresponding network size random variable, for both the collision detection and no collision detection assumptions. We then analyze upper bounds for these settings, assuming now that the distribution provided as input might differ from the actual distribution generating network sizes. We express their performance with respect to both entropy and the statistical divergence between the two distributions -- allowing us to quantify the cost of poor predictions. Finally, we turn our attention to the related perfect advice setting, parameterized with a length $bgeq 0$, in which all active processes in a given execution are provided the best possible $b$ bits of information about their network. We provide tight bounds on the speed-up possible with respect to $b$ for deterministic and randomized algorithms, with and without collision detection. These bounds provide a fundamental limit on the maximum power that can be provided by any predictive model with a bounded output size.