This study demonstrates the feasibility of the proactive received power prediction by leveraging spatiotemporal visual sensing information toward the reliable millimeter-wave (mmWave) networks. Since the received power on a mmWave link can attenuate aperiodically due to a human blockage, the long-term series of the future received power cannot be predicted by analyzing the received signals before the blockage occurs. We propose a novel mechanism that predicts a time series of the received power from the next moment to even several hundred milliseconds ahead. The key idea is to leverage the camera imagery and machine learning (ML). The time-sequential images can involve the spatial geometry and the mobility of obstacles representing the mmWave signal propagation. ML is used to build the prediction model from the dataset of sequential images labeled with the received power in several hundred milliseconds ahead of when each image is obtained. The simulation and experimental evaluations using IEEE 802.11ad devices and a depth camera show that the proposed mechanism employing convolutional LSTM predicted a time series of the received power in up to 500 ms ahead at an inference time of less than 3 ms with a root-mean-square error of 3.5 dB.