Federated learning (FL) is a distributed learning methodology that allows multiple nodes to cooperatively train a deep learning model, without the need to share their local data. It is a promising solution for telemonitoring systems that demand intensive data collection, for detection, classification, and prediction of future events, from different locations while maintaining a strict privacy constraint. Due to privacy concerns and critical communication bottlenecks, it can become impractical to send the FL updated models to a centralized server. Thus, this paper studies the potential of hierarchical FL in IoT heterogeneous systems and propose an optimized solution for user assignment and resource allocation on multiple edge nodes. In particular, this work focuses on a generic class of machine learning models that are trained using gradient-descent-based schemes while considering the practical constraints of non-uniformly distributed data across different users. We evaluate the proposed system using two real-world datasets, and we show that it outperforms state-of-the-art FL solutions. In particular, our numerical results highlight the effectiveness of our approach and its ability to provide 4-6% increase in the classification accuracy, with respect to hierarchical FL schemes that consider distance-based user assignment. Furthermore, the proposed approach could significantly accelerate FL training and reduce communication overhead by providing 75-85% reduction in the communication rounds between edge nodes and the centralized server, for the same model accuracy.