Learning time-dependent partial differential equations (PDEs) that govern evolutionary observations is one of the core challenges for data-driven inference in many fields. In this work, we propose to capture the essential dynamics of numerically challenging PDEs arising in multiscale modeling and simulation -- kinetic equations. These equations are usually nonlocal and contain scales/parameters that vary by several orders of magnitude. We introduce an efficient framework, Densely Connected Recurrent Neural Networks (DC-RNNs), by incorporating a multiscale ansatz and high-order implicit-explicit (IMEX) schemes into RNN structure design to identify analytic representations of multiscale and nonlocal PDEs from discrete-time observations generated from heterogeneous experiments. If present in the observed data, our DC-RNN can capture transport operators, nonlocal projection or collision operators, macroscopic diffusion limit, and other dynamics. We provide numerical results to demonstrate the advantage of our proposed framework and compare it with existing methods.