Existing information-theoretic frameworks based on maximum entropy network ensembles are not able to explain the emergence of heterogeneity in complex networks. Here, we fill this gap of knowledge by developing a classical framework for networks based on finding an optimal trade-off between the information content of a compressed representation of the ensemble and the information content of the actual network ensemble. In this way not only we introduce a novel classical network ensemble satisfying a set of soft constraints but we are also able to calculate the optimal distribution of the constraints. We show that for the classical network ensemble in which the only constraints are the expected degrees a power-law degree distribution is optimal. Also, we study spatially embedded networks finding that the interactions between nodes naturally lead to non-uniform spread of nodes in the space, with pairs of nodes at a given distance not necessarily obeying a power-law distribution. The pertinent features of real-world air transportation networks are well described by the proposed framework.