Small metal clusters are of fundamental scientific interest and of tremendous significance in catalysis. These nanoscale clusters display diverse geometries and structural motifs depending on the cluster size; a knowledge of this size-dependent structural motifs and their dynamical evolution has been of longstanding interest. Classical MD typically employ predefined functional forms which limits their ability to capture such complex size-dependent structural and dynamical transformation. Neural Network (NN) based potentials represent flexible alternatives and in principle, well-trained NN potentials can provide high level of flexibility, transferability and accuracy on-par with the reference model used for training. A major challenge, however, is that NN models are interpolative and requires large quantities of training data to ensure that the model adequately samples the energy landscape both near and far-from-equilibrium. Here, we introduce an active learning (AL) scheme that trains a NN model on-the-fly with minimal amount of first-principles based training data. Our AL workflow is initiated with a sparse training dataset (1 to 5 data points) and is updated on-the-fly via a Nested Ensemble Monte Carlo scheme that iteratively queries the energy landscape in regions of failure and updates the training pool to improve the network performance. Using a representative system of gold clusters, we demonstrate that our AL workflow can train a NN with ~500 total reference calculations. Our NN predictions are within 30 meV/atom and 40 meV/AA of the reference DFT calculations. Moreover, our AL-NN model also adequately captures the various size-dependent structural and dynamical properties of gold clusters in excellent agreement with DFT calculations and available experiments.