We study the spatio-temporal prediction problem and introduce a novel point-process-based prediction algorithm. Spatio-temporal prediction is extensively studied in Machine Learning literature due to its critical real-life applications such as crime, earthquake, and social event prediction. Despite these thorough studies, specific problems inherent to the application domain are not yet fully explored. Here, we address the non-stationary spatio-temporal prediction problem on both densely and sparsely distributed sequences. We introduce a probabilistic approach that partitions the spatial domain into subregions and models the event arrivals in each region with interacting point-processes. Our algorithm can jointly learn the spatial partitioning and the interaction between these regions through a gradient-based optimization procedure. Finally, we demonstrate the performance of our algorithm on both simulated data and two real-life datasets. We compare our approach with baseline and state-of-the-art deep learning-based approaches, where we achieve significant performance improvements. Moreover, we also show the effect of using different parameters on the overall performance through empirical results and explain the procedure for choosing the parameters.