There is a growing interest in developing data-driven subgrid-scale (SGS) models for large-eddy simulation (LES) using machine learning (ML). In a priori (offline) tests, some recent studies have found ML-based data-driven SGS models that are trained on high-fidelity data (e.g., from direct numerical simulation, DNS) to outperform baseline physics-based models and accurately capture the inter-scale transfers, both forward (diffusion) and backscatter. While promising, instabilities in a posteriori (online) tests and inabilities to generalize to a different flow (e.g., with a higher Reynolds number, Re) remain as major obstacles in broadening the applications of such data-driven SGS models. For example, many of the same aforementioned studies have found instabilities that required often ad-hoc remedies to stabilize the LES at the expense of reducing accuracy. Here, using 2D decaying turbulence as the testbed, we show that deep fully convolutional neural networks (CNNs) can accurately predict the SGS forcing terms and the inter-scale transfers in a priori tests, and if trained with enough samples, lead to stable and accurate a posteriori LES-CNN. Further analysis attributes these instabilities to the disproportionally lower accuracy of the CNNs in capturing backscattering when the training set is small. We also show that transfer learning, which involves re-training the CNN with a small amount of data (e.g., 1%) from the new flow, enables accurate and stable a posteriori LES-CNN for flows with 16x higher Re (as well as higher grid resolution if needed). These results show the promise of CNNs with transfer learning to provide stable, accurate, and generalizable LES for practical use.