Interactional synchrony refers to how the speech or behavior of two or more people involved in a conversation become more finely synchronized with each other, and they can appear to behave almost in direct response to one another. Studies have shown that interactional synchrony is a hallmark of relationships, and is produced as a result of rapport. %Research has also shown that up to two-thirds of human communication occurs via nonverbal channels such as gestures (or body movements), facial expressions, etc. In this work, we use computer vision based methods to extract nonverbal cues, specifically from the face, and develop a model to measure interactional synchrony based on those cues. This paper illustrates a novel method of constructing a dynamic deep neural architecture, specifically made up of intermediary long short-term memory networks (LSTMs), useful for learning and predicting the extent of synchrony between two or more processes, by emulating the nonlinear dependencies between them. On a synthetic dataset, where pairs of sequences were generated from a Gaussian process with known covariates, the architecture could successfully determine the covariance values of the generating process within an error of 0.5% when tested on 100 pairs of interacting signals. On a real-life dataset involving groups of three people, the model successfully estimated the extent of synchrony of each group on a scale of 1 to 5, with an overall prediction mean of $2.96%$ error when performing 5-fold validation, as compared to 26.1% on the random permutations serving as the control baseline.