This paper presents a Graphics Processing Units (GPUs) acceleration method of an iterative scheme for gas-kinetic model equations. Unlike the previous GPU parallelization of explicit kinetic schemes, this work features a fast converging iterative scheme. The memory reduction techniques in this method enable full three-dimensional (3D) solution of kinetic model equations in contemporary GPUs usually with a limited memory capacity that otherwise would need terabytes of memory. The GPU algorithm is validated against the DSMC simulation of the 3D lid-driven cavity flow and the supersonic rarefied gas flow past a cube with grids size up to 0.7 trillion points in the phase space. The performance of the GPU algorithm is assessed by comparing with the corresponding parallel CPU program using Message Passing Interface (MPI). The profiling on several models of GPUs shows that the algorithm has a medium to high level of utilization of the GPUs computing and memory resources. A $190times$ speedup can be achieved on the Tesla K40 GPUs against a single core of Intel Xeon-E5-2680v3 CPU for the 3D lid-driven cavity flow.