Audio-Visual Biometric Recognition and Presentation Attack Detection: A Comprehensive Survey

67 0 0.0 ( 0 )

Download Cite

Added by Hareesh Mandalapu

Publication date 2021

fields Informatics Engineering

and research's language is English

Authors Hareesh Mandalapu - P N Aravinda Reddy - Raghavendra Ramachandra

Cryptography and Security

visit our facebook page

‎Shamra Academia - شمرا أكاديميا‎

Ask ChatGPT about the research

Abstract in Arabic Abstract in English

Biometric recognition is a trending technology that uses unique characteristics data to identify or verify/authenticate security applications. Amidst the classically used biometrics, voice and face attributes are the most propitious for prevalent applications in day-to-day life because they are easy to obtain through restrained and user-friendly procedures. The pervasiveness of low-cost audio and face capture sensors in smartphones, laptops, and tablets has made the advantage of voice and face biometrics more exceptional when compared to other biometrics. For many years, acoustic information alone has been a great success in automatic speaker verification applications. Meantime, the last decade or two has also witnessed a remarkable ascent in face recognition technologies. Nonetheless, in adverse unconstrained environments, neither of these techniques achieves optimal performance. Since audio-visual information carries correlated and complementary information, integrating them into one recognition system can increase the systems performance. The vulnerability of biometrics towards presentation attacks and audio-visual data usage for the detection of such attacks is also a hot topic of research. This paper made a comprehensive survey on existing state-of-the-art audio-visual recognition techniques, publicly available databases for benchmarking, and Presentation Attack Detection (PAD) algorithms. Further, a detailed discussion on challenges and open problems is presented in this field of biometrics.

rate research

Face Morphing Attack Generation & Detection: A Comprehensive Survey

160 - Sushma Venkatesh , Raghavendra Ramachandra , Kiran Raja 2020

The vulnerability of Face Recognition System (FRS) to various kind of attacks (both direct and in-direct attacks) and face morphing attacks has received a great interest from the biometric community. The goal of a morphing attack is to subvert the FRS at Automatic Border Control (ABC) gates by presenting the Electronic Machine Readable Travel Document (eMRTD) or e-passport that is obtained based on the morphed face image. Since the application process for the e-passport in the majority countries requires a passport photo to be presented by the applicant, a malicious actor and the accomplice can generate the morphed face image and to obtain the e-passport. An e-passport with a morphed face images can be used by both the malicious actor and the accomplice to cross the border as the morphed face image can be verified against both of them. This can result in a significant threat as a malicious actor can cross the border without revealing the track of his/her criminal background while the details of accomplice are recorded in the log of the access control system. This survey aims to present a systematic overview of the progress made in the area of face morphing in terms of both morph generation and morph detection. In this paper, we describe and illustrate various aspects of face morphing attacks, including different techniques for generating morphed face images but also the state-of-the-art regarding Morph Attack Detection (MAD) algorithms based on a stringent taxonomy and finally the availability of public databases, which allow to benchmark new MAD algorithms in a reproducible manner. The outcomes of competitions/benchmarking, vulnerability assessments and performance evaluation metrics are also provided in a comprehensive manner. Furthermore, we discuss the open challenges and potential future works that need to be addressed in this evolving field of biometrics.

Computer Vision and Pattern Recognition Cryptography and Security Computers and Society

Demographic Bias in Presentation Attack Detection of Iris Recognition Systems

315 - Meiling Fang , Naser Damer , Florian Kirchbuchner 2020

With the widespread use of biometric systems, the demographic bias problem raises more attention. Although many studies addressed bias issues in biometric verification, there are no works that analyze the bias in presentation attack detection (PAD) decisions. Hence, we investigate and analyze the demographic bias in iris PAD algorithms in this paper. To enable a clear discussion, we adapt the notions of differential performance and differential outcome to the PAD problem. We study the bias in iris PAD using three baselines (hand-crafted, transfer-learning, and training from scratch) using the NDCLD-2013 database. The experimental results point out that female users will be significantly less protected by the PAD, in comparison to males.

Computer Vision and Pattern Recognition

A Survey on Audio Synthesis and Audio-Visual Multimodal Processing

86 - Zhaofeng Shi 2021

With the development of deep learning and artificial intelligence, audio synthesis has a pivotal role in the area of machine learning and shows strong applicability in the industry. Meanwhile, significant efforts have been dedicated by researchers to handle multimodal tasks at present such as audio-visual multimodal processing. In this paper, we conduct a survey on audio synthesis and audio-visual multimodal processing, which helps understand current research and future trends. This review focuses on text to speech(TTS), music generation and some tasks that combine visual and acoustic information. The corresponding technical methods are comprehensively classified and introduced, and their future development trends are prospected. This survey can provide some guidance for researchers who are interested in the areas like audio synthesis and audio-visual multimodal processing.

Audio and Speech Processing Sound

A Zero-Shot based Fingerprint Presentation Attack Detection System

126 - Haozhe Liu , Wentian Zhang , Guojie Liu 2020

With the development of presentation attacks, Automated Fingerprint Recognition Systems(AFRSs) are vulnerable to presentation attack. Thus, numerous methods of presentation attack detection(PAD) have been proposed to ensure the normal utilization of AFRS. However, the demand of large-scale presentation attack images and the low-level generalization ability always astrict existing PAD methods actual performances. Therefore, we propose a novel Zero-Shot Presentation Attack Detection Model to guarantee the generalization of the PAD model. The proposed ZSPAD-Model based on generative model does not utilize any negative samples in the process of establishment, which ensures the robustness for various types or materials based presentation attack. Different from other auto-encoder based model, the Fine-grained Map architecture is proposed to refine the reconstruction error of the auto-encoder networks and a task-specific gaussian model is utilized to improve the quality of clustering. Meanwhile, in order to improve the performance of the proposed model, 9 confidence scores are discussed in this article. Experimental results showed that the ZSPAD-Model is the state of the art for ZSPAD, and the MS-Score is the best confidence score. Compared with existing methods, the proposed ZSPAD-Model performs better than the feature-based method and under the multi-shot setting, the proposed method overperforms the learning based method with little training data. When large training data is available, their results are similar.

Computer Vision and Pattern Recognition Machine Learning Image and Video Processing

Multilingual Audio-Visual Smartphone Dataset And Evaluation

97 - Hareesh Mandalapu , Aravinda Reddy P N , Raghavendra Ramachandra 2021

Smartphones have been employed with biometric-based verification systems to provide security in highly sensitive applications. Audio-visual biometrics are getting popular due to the usability and also it will be challenging to spoof because of multi-modal nature. In this work, we present an audio-visual smartphone dataset captured in five different recent smartphones. This new dataset contains 103 subjects captured in three different sessions considering the different real-world scenarios. Three different languages are acquired in this dataset to include the problem of language dependency of the speaker recognition systems. These unique characteristics of this dataset will pave the way to implement novel state-of-the-art unimodal or audio-visual speaker recognition systems. We also report the performance of the bench-marked biometric verification systems on our dataset. The robustness of biometric algorithms is evaluated towards multiple dependencies like signal noise, device, language and presentation attacks like replay and synthesized signals with extensive experiments. The obtained results raised many concerns about the generalization properties of state-of-the-art biometrics methods in smartphones.

Cryptography and Security Computer Vision and Pattern Recognition