In this paper, we propose a quality enhancement network of versatile video coding (VVC) compressed videos by jointly exploiting spatial details and temporal structure (SDTS). The proposed network consists of a temporal structure fusion subnet and a spatial detail enhancement subnet. The former subnet is used to estimate and compensate the temporal motion across frames, and the latter subnet is used to reduce the compression artifacts and enhance the reconstruction quality of compressed video. Experimental results demonstrate the effectiveness of our SDTS-based method.