Ultra-dense deployments in 5G, the next generation of cellular networks, are an alternative to provide ultra-high throughput by bringing the users closer to the base stations. On the other hand, 5G deployments must not incur a large increase in energy consumption in order to keep them cost-effective and most importantly to reduce the carbon footprint of cellular networks. We propose a reinforcement learning cell switching algorithm, to minimize the energy consumption in ultra-dense deployments without compromising the quality of service (QoS) experienced by the users. In this regard, the proposed algorithm can intelligently learn which small cells (SCs) to turn off at any given time based on the traffic load of the SCs and the macro cell. To validate the idea, we used the open call detail record (CDR) data set from the city of Milan, Italy, and tested our algorithm against typical operational benchmark solutions. With the obtained results, we demonstrate exactly when and how the proposed algorithm can provide energy savings, and moreover how this happens without reducing QoS of users. Most importantly, we show that our solution has a very similar performance to the exhaustive search, with the advantage of being scalable and less complex.