In this paper, we address inter-beam inter-cell interference mitigation in 5G networks that employ millimeter-wave (mmWave), beamforming and non-orthogonal multiple access (NOMA) techniques. Those techniques play a key role in improving network capacity and spectral efficiency by multiplexing users on both spatial and power domains. In addition, the coverage area of multiple beams from different cells can intersect, allowing more flexibility in user-cell association. However, the intersection of coverage areas also implies increased inter-beam inter-cell interference, i.e. interference among beams formed by nearby cells. Therefore, joint user-cell association and inter-beam power allocation stand as a promising solution to mitigate inter-beam, inter-cell interference. In this paper, we consider a 5G mmWave network and propose a reinforcement learning algorithm to perform joint user-cell association and inter-beam power allocation to maximize the sum rate of the network. The proposed algorithm is compared to a uniform power allocation that equally divides power among beams per cell. Simulation results present a performance enhancement of 13-30% in networks sum-rate corresponding to the lowest and highest traffic loads, respectively.