The reliability of cardiovascular computational models depends on the accurate solution of the hemodynamics, the realistic characterization of the hyperelastic and electric properties of the tissues along with the correct description of their interaction. The resulting fluid-structure-electrophysiology interaction (FSEI) thus requires an immense computational power, usually available in large supercomputing centers, and requires long time to obtain results even if multi-CPU processors are used (MPI acceleration). In recent years, graphics processing units (GPUs) have emerged as a convenient platform for high performance computing, as they allow for considerable reductions of the time-to-solution. This approach is particularly appealing if the tool has to support medical decisions that require solutions within reduced times and possibly obtained by local computational resources. Accordingly, our multi-physics solver has been ported to GPU architectures using CUDA Fortran to tackle fast and accurate hemodynamics simulations of the human heart without resorting to large-scale supercomputers. This work describes the use of CUDA to accelerate the FSEI on heterogeneous clusters, where both the CPUs and GPUs are used in synergistically with minor modifications of the original source code. The resulting GPU accelerated code solves a single heartbeat within a few hours (from three to ten depending on the grid resolution) running on premises computing facility made of few GPU cards, which can be easily installed in a medical laboratory or in a hospital, thus opening towards a systematic computational fluid dynamics (CFD) aided diagnostic.