Many different studies have shown that a wealth of cosmological information resides on small, non-linear scales. Unfortunately, there are two challenges to overcome to utilize that information. First, we do not know the optimal estimator that will allow us to retrieve the maximum information. Second, baryonic effects impact that regime significantly and in a poorly understood manner. Ideally, we would like to use an estimator that extracts the maximum cosmological information while marginalizing over baryonic effects. In this work we show that neural networks can achieve that. We made use of data where the maximum amount of cosmological information is known: power spectra and 2D Gaussian density fields. We also contaminate the data with simplified baryonic effects and train neural networks to predict the value of the cosmological parameters. For this data, we show that neural networks can 1) extract the maximum available cosmological information, 2) marginalize over baryonic effects, and 3) extract cosmological information that is buried in the regime dominated by baryonic physics. We also show that neural networks learn the priors of the data they are trained on. We conclude that a promising strategy to maximize the scientific return of cosmological experiments is to train neural networks on state-of-the-art numerical simulations with different strengths and implementations of baryonic effects.