In this work, we propose a globally optimal joint successive interference cancellation (SIC) ordering and power allocation (JSPA) algorithm for the sum-rate maximization problem in downlink multi-cell non-orthogonal multiple access (NOMA) systems. The proposed algorithm is based on the exploration of base stations (BSs) power consumption, and closed-form of optimal powers obtained for each cell. Although the optimal JSPA algorithm scales well with larger number of users, it is still exponential in the number of cells. For any suboptimal decoding order, we propose a low-complexity near-optimal joint rate and power allocation (JRPA) strategy in which the complete rate region of users is exploited. Furthermore, we design a near-optimal semi-centralized JSPA framework for a two-tier heterogeneous network such that it scales well with larger number of small-BSs and users. Numerical results show that JRPA highly outperforms the case that the users are enforced to achieve their channel capacity by imposing the well-known SIC necessary condition on power allocation. Moreover, the proposed semi-centralized JSPA framework significantly outperforms the fully distributed framework, where all the BSs operate in their maximum power budget. Therefore, the centralized JRPA and semi-centralized JSPA algorithms with near-to-optimal performance are good choices for larger number of cells and users.