The audio-visual speech recognition systems that rely on speech and movement of the lips of the speaker of the most important speech recognition systems. Many different techniques have developed in terms of the methods used in the feature extraction and classification methods. Research proposes the establishment of a system to identify isolated words based audio features extracted from videos pronunciations of words in Arabic in an environment free of noise, and then add the energy and Temporal derivative components in extracting features of the method Mel Frequency Cepstral Coefficient (MFCC) stage.