In this research, some of audio signal properties have been studied according to the
speaker's vocal tract shape. A database of audio files has been recorded. These files belong
to 57 men whose age between 35 and 45. All speakers came from the same
academic and
social culture. Furthermore, they don't suffer from any problems in hearings and utterance.
The vowel database was created in perfect recording conditions. The spent time
needed for recording process was about five minutes for each speaker who said the Arabic
word " سألتمُونِيهَا " three times. That word is very rich of vowel letters. It composes of the
whole Arabic long vowel.
Based on the analysis study of the recorded audio signals, the relationship between
the formant frequencies and the length of speaker's vocal tract has been studied. The results
show an inverse proportion for the first three frequencies F1, f2, F3 and no clear
relationship for the two other frequencies F4, F5.
The audio-visual speech recognition systems that rely on speech and
movement of the lips of the speaker of the most important speech
recognition systems. Many different techniques have developed in
terms of the methods used in the feature extracti
on and classification
methods.
Research proposes the establishment of a system to identify isolated
words based audio features extracted from videos pronunciations of
words in Arabic in an environment free of noise, and then add the
energy and Temporal derivative components in extracting features of
the method Mel Frequency Cepstral Coefficient (MFCC) stage.
The speech recognition is one of the most modern technologies, which entered force
in various fields of life, whether medical or security or industrial techniques. Accordingly,
many related systems were developed, which differ from each otherin fea
ture extraction
methods and classification methods.
In this research,three systems have been created for speech recognition.They differ
from each other in the used methods during the stage of features extraction.While the first
system used MFCC algorithm, the second system used LPCC algorithm, and the third
system used PLP algorithm.All these three systems used HMM as classifier.
At the first, the performance of the speechrecognitionprocesswas studied and
evaluatedfor all the proposedsystems separately. After that, the combination algorithm was
applied separately on eachpair of the studied system algorithmsin order to study the effect
of using the combination algorithm onthe improvement of the speech recognition process.
Twokinds of errors(simultaneous errors and dependent errors) were usedto evaluate
the complementaryof each pair of the studied systems, and to study the effectiveness of the
combination on improving the performance of speech recognition process. It can be seen
from the results of the comparison that the best improvement ratio of speech recognition
has been obtained in the case of collection MFCC and PLP algorithms with recognition
ratio of 93.4%.
In this research, a new comparison criterion was proposed to study properties of the
audio signal for each of the varieties of smokers and non-smoking persons. For this
purpose, a database for smokers has been created. The smoker database contains
12 Syrian
native speakers, six of them were smokers and the others were non-smokers. The smokers
had been smoking for more than 10 years. All speakers were men and their ages ranging
between 35 and 42 years old. They live in rural towns and speak the same dialect.
Syrian vowels can be classified into long vowels and short ones. The long vowels are
/AA/, /UU/, /II/ pronounced as ([ ي, و, ا ]) and the short vowels are /A/, /U/, /I/ pronounced
as ([ كسرة, ضمة, فتحة ]). In this study, the Speakers have to pronounce the following sentence
/I love Syria/ pronounced as ([ أَنَاْ أَحَبُّ سُوْرِيْة ]), and it was spoken during three hours. This
sentence is rich with vowels.
For each speaker, a long vowel triangle in ten planes and a short vowel triangle in ten
planes as well were generated and analyzed. A new criterion was suggested to determine
the most suitable vowel triangle for smoker distinction. This criterion depends on
calculating the different distances among all centers of vowel triangles in each plane and
determining the minimal distance called d. For each plane, the most suitable vowel triangle
had been set as AIU35 short vowel triangle and AAIIUU45 long vowel triangle.
الغاية من هذا البحث بناء نظام لتصنيف نطق الأرقام الانكليزية وذلك بالاعتماد على نماذج ماركوف المخفية في التصنيف وذلك بالاعتماد على طيف الإشارة في استخراج سمات الإشارات
Voice recognition includes two basic parts: speech and speaker recognition. These
recognition processes consider as the most important processes of modern technologies,
many systems has been developed that differ in the methods used to extract feat
ures and
classification ways to support recognition systems of this type.
The study was conducted in this research on the previous subject, where the system
is designed to recognize the speaker and his voice orders and focus on several
complementary algorithms to carry out the research. we conducted an analytical study on
MFCC algorithm used in the extraction of features, and it has been studying two
parameters the number of filters in the filters bank and the number of features that taken
from each frame and the impact of these two parameters in the recognition rate and the
relationship of these two parameters on each other. It was the use of feed forwarding back
propagation neural networks performance analysis as characteristics and we analyze the
performance of the network to gain access to the best features and components to the
process of achieving recognition. And it has been studying Endpoint algorithm that used
to remove periods of silence and its impact on voice recognition rates.
Speech databases form the main foundation in the construction of automatic
utterance, speaker recognition and speech recognition systems in different languages and
dialects. The elements of the speech database are audio files recorded for people's
voices in
the required language or dialect. The more the speech database is enriched with
comprehensive elements the more it contributes to produce systems that communicate with
the excellent performed machine. According to the lack of speech databases for the Syrian
dialects, the research did one. The created database contained sixteen voluntaries from
different Syrian dialects. Voluntaries' voices were recorded in different recording
conditions that is for studying the effect of variety of dialects, gender and the conditions of
recording on the vowel polygons. This research invested the created speech database in the
field of generating and analyzing of vowel polygons, as the vowel polygon is a geometric
polygon where its vertices represent the values of formant frequencies, and the area of the
polygon represents the output acoustic space.