The audio-visual speech recognition systems that rely on speech and
movement of the lips of the speaker of the most important speech
recognition systems. Many different techniques have developed in
terms of the methods used in the feature extracti
on and classification
methods.
Research proposes the establishment of a system to identify isolated
words based audio features extracted from videos pronunciations of
words in Arabic in an environment free of noise, and then add the
energy and Temporal derivative components in extracting features of
the method Mel Frequency Cepstral Coefficient (MFCC) stage.