Federated learning (FL) is a distributed learning paradigm that enables a large number of mobile devices to collaboratively learn a model under the coordination of a central server without sharing their raw data. Despite its practical efficiency and effectiveness, the iterative on-device learning process (e.g., local computations and global communications with the server) incurs a considerable cost in terms of learning time and energy consumption, which depends crucially on the number of selected clients and the number of local iterations in each training round. In this paper, we analyze how to design adaptive FL in mobile edge networks that optimally chooses these essential control variables to minimize the total cost while ensuring convergence. We establish the analytical relationship between the total cost and the control variables with the convergence upper bound. To efficiently solve the cost minimization problem, we develop a low-cost sampling-based algorithm to learn the convergence related unknown parameters. We derive important solution properties that effectively identify the design principles for different optimization metrics. Practically, we evaluate our theoretical results both in a simulated environment and on a hardware prototype. Experimental evidence verifies our derived properties and demonstrates that our proposed solution achieves near-optimal performance for different optimization metrics for various datasets and heterogeneous system and statistical settings.