Smart shorthand, to display video content, is one of the main problems in computer vision literature, because it is important to reduce the size of video storage in various media, especially in mobile phones and monitoring cameras, and reduce the time needed to watch video. The smart shorthand process is to build software capable of displaying and save important content from the viewer, which contains new details, either in terms of the image or in the accompanying voice and deleting scenes with repeated content. In this research, a new methodology was introduced to extract new scenes in the image and sound, without affecting the continuity of motion within the video, and in a manner that ensures continuous viewing. The methodology relied on two basic algorithms: the first algorithm works to extract scenes with variable details in the image, based on the eigenvalues of the scenes, which show a significant change in the details of the scene, while the second algorithm is based on the extraction of sound with variable details, based on the algorithm introduced in 1985 from [1], which can encode the sound signal with a double-value frame 1 or 0, in the signal area containing details that takes value 1, while in the non- Details takes value 0, the two algorithms are executed synchronously, and thus the variable scenes and the adjacent acoustic signal are drawn. The methodology used to work on large video clips in terms of movement of objects within them has achieved very good effectiveness, great accuracy in synchronization between the scenes and sound adjacent to them.