Face presentation attack detection: Research opportunities and perspectives

Abstract

The rapid development of biometric methods and their implementation in practice has led to the widespread attacks called spoofing, which are purely biometric vulnerabilities, but are not used in conjunction with other IT security solutions. Although biometric recognition as a branch of computer science dates back to the 1960s, attacks on biometric systems have become more sophisticated since the 2010s due to great advances in pattern recognition. It should be noted that face recognition is the most attractive topic for deceiving recognition systems. Popular presentation attacks, such as print, replay and mask attacks, have demonstrated a high security risk for SOTA face recognition systems. Many Presentation Attack Detection (PAD) methods (also known as face anti-spoofing methods or countermeasures) have been proposed that can automatically detect and mitigate such targeted attacks. The article presents a systematic survey in face anti-spoofing with prognostic trends in this research area. A brief description of 16 outstanding previous surveys on the face PAD field is mentioned, from which it is possible to trace how this scientific topic has developed. SOTA in PAD provides an analysis of a wide range of the PAD methods, which are categorized into two unbalanced groups: digital (feature-based) and physical (sensor-based) methods. Generalization of deep learning methods as a recent trend aimed at improving recognition results requires special attention. This survey presents five types of generalization such as transfer learning, anomaly detection, few-shot and zero-shot learning, auxiliary supervision, and multi-spectral methods. A summary of over than 40 existing 2D/3D face spoofing databases is a guideline for those who want to select databases for experiments. One can also find a description of performance evaluation metrics and testing protocols. In addition, we discuss trends and perspectives in the emerging field of facial biometrics.

Keywords

Face biometric face anti-spoofing print attack replay attack 3D mask attack plastic surgery attack texture analysis motion analysis liveness detection image quality analysis database

1. Introduction

Automated face recognition systems have found use in various applications due to their obvious advantages over other biometric modalities, especially in applications such as checking-in and mobile payments. There are many commercial face recognition systems, for example, SmartGate from the Australian Border Force and New Zealand Customs Service, which compares the face of the traveller with the data in the e-passport microchip, Skynet system capable for scanning the entire Chinese population in one second and the world population in two seconds, the Indian National Automated Facial Recognition System (AFRS) designed for security purposes, etc. Such advances have been made possible by the enormous efforts of researchers in the field of facial recognition. As a result, SaaS (Software as a Service)-based facial recognition engines, self-hosted REST (REpresentational State Transfer) API solutions and open-source frameworks and libraries are available to users as facial recognition services. However, the issues of how to protect any face recognition system from intruders remain open, since the motivation for intruders is high as long as it is possible to create facial artefacts inexpensively and easily carry out an attack.

Vulnerabilities in face recognition systems can affect various components. According to the International Standard ISO/IEC 30107-1:2016 [1], nine types of attacks have been reported on Data Capture Module (sensor), Signal Processing Module, Data Store, Comparison Module, and Decision Module, as well as transmission channels between them: (1) attack on sensor, (2) modify biometric sample, (3) override signal processor, (4) modify probe, (5) override comparator, (6) override or modify database, (7) modify biometric reference, (8) modify score, and (9) override decision. Capture device, data transmission and data store are the three main points for a targeted attack. Currently, attacks on the sensor, called presentation attacks, are classified as “direct attacks” (human attacks) and “indirect attacks” (artificial attacks).

The ISO/IEC 30107 standard consists of three parts. Part 1 (Framework) provides the foundation for PAD by defining the terms and establishing a framework with which presentation attacks can be specified and detected. Part 2 (Data Formats) defines data formats for conveying the type of approach used in biometric presentation attack detection and for conveying the results of presentation attack detection methods. Part 3 (Test Methodology) establishes principles and methods for performance assessment of presentation attack detection algorithms. As one can see, the field of research is very wide and cannot be a subject of one article. Here we will focus on the PAD field.

The main terms from the ISO/IEC DIS 30107-3 standard [2] are listed below:

•
“Artefact” is an artificial object or representation presenting a copy of biometric characteristics or synthetic biometric patterns.
•
“Liveness” is the quality or state of being alive, made evident by anatomical characteristics, involuntary reactions or physiological functions, or voluntary reactions or subject behaviours.
•
“Liveness detection” is the measurement and analysis of anatomical characteristics or involuntary or voluntary reactions, in order to determine if a biometric sample is being captured from a living subject present at the point of capture. Liveness detection methods are a subset of presentation attack detection methods.
•
“Normal presentation” is an interaction of the biometric capture subject and the biometric data capture subsystem in the fashion intended by the policy of the biometric system. Any type of presentation that is not an attack is considered a “normal presentation”.
•
“Presentation attack” is a presentation to the biometric data capture subsystem with the goal of interfering with the operation of the biometric system. Presentation attack can be implemented through a number of methods, e.g. artefact, mutilations, replay, etc. Presentation attacks may have a number of goals, e.g. impersonation or not being recognized. Biometric systems may not be able to differentiate between biometric presentation attacks with the goal of interfering with the systems operation and non-conformant presentations.
•
“Presentation attack detection” is an automated determination of a presentation attack. PAD cannot infer the subject’s intent. In fact it may be impossible to derive that difference from the data capture process or acquired sample.
•
“Presentation attack instrument” is a biometric characteristic or object used in a presentation attack. The set of presentation attack instrument includes artefacts but would also include lifeless biometric characteristics or altered biometric characteristics that are used in an attack.

Presentation attack is a process in which a user can attack a facial recognition system by masquerading as a registered user and thereby gaining illegal access and benefits or trying to avoid being recognized. Presentation attacks are classified into print, replay and masking attacks. The most difficult cases are the attacks with the help of professional masking or professional makeup. Print attacks are based on a single printed photo, a photo from a smartphone or laptop. It should be noted that sometimes user cuts holes in the printed photo for eyes, nose and mouth, providing a partial liveliness of his/her face. This is the easiest way to fool the facial recognition system for many users. The main difficulty in detecting a fake image is the lack of temporal information. Replay attacks provide a short clip with a “live” face. In general, many algorithms are focused on determining the liveliness of a face using various cues.

Depending on the sensors used, the PAD methods are divided into digital and physical domains. Physical domain explores remote physiological signals (e.g., remote Photo Plethysmo Graphy (rPPG) [3, 4, 5]), which makes them less attractive in practice. Currently, digital domain occupies the largest area of research. Methods in both domains have evolved from traditional methods based on hand-crafted features [6, 7, 8, 9, 10, 11] to deep learning methods [12, 13, 14, 15, 16, 17] as well as hybrid methods [18, 19, 20, 21]. Human liveness cues play the important role in PAD. Many algorithms are based on eye-blinking detection [22, 23], face and head movement [24, 25] (e.g., nodding and smiling), or gaze tracking [26, 27, 28]. An original interpretation was proposed in [29], when the authors applied appropriate nonlinear adjustment and hair geometry to amplify the contrast between real faces and attacks using simple Convolutional Neural Network (CNN).

Since 2013, it has been shown that adversarial attacks lead to incorrect classification by deep learning models. This is a serious weakness of deep learning models that provoked an appearance of specific toolboxes (like Foolbox), which contain many SOTA adversarial attacks. Adversarial attacks are usually referred to as digital domain attacks and physical domain attacks. Digital domain attacks generate adversarial facial images using queries, geometric transformations, and examples created by generative adversarial networks. As a rule, the intruders do not have the possibility to directly feed a distorted image into the system, and digital images with adversarial perturbation are always captured by cameras or sensors. Year after year, the adversarial attacks work with more realistic setting, exacerbating the security problem. Adversarial attacks in the physical domain, such as printed glasses or skin stickers, take up less space but are just as successful. It should be noted that face spoofing protection is a multi-tasking problem that includes face recognition, emotion assessment, and aging assessment.

Since 2014, many outstanding surveys have been published. Some of them are presented in Table 1.

Table 1
Previous surveys on the PAD field (multi-modalities include vision, near-infrared, thermal, depth, multispectral short wave infrared, and polarized imaging or imaging received by specialized light field-based or flash sensors)

Title and reference Year Deep learning Modality (vision/ multi-modality) Databases Testing protocol ${}^{}$

Biometric anti-spoofing methods: A survey in face recognition [30] 2014 No Multi-modality 6 Intra-database

Face spoofing and counter-spoofing: A survey of state-of-the-art algorithms [31] 2017 No Vision 6 Intra-database

Presentation attack detection methods for face recognition systems: A comprehensive survey [32] 2017 No Vision 11 Inter-database

Deeply vulnerable: A study of the robustness of face recognition to presentation attacks [33] 2017 Yes Vision 4 Inter-database

How far did we get in face spoofing detection? [34] 2018 Yes Vision 13 Inter-database

A survey of mobile face biometrics [35] 2018 Yes Vision 6 Inter-database

Insight on face liveness detection: A systematic literature review [36] 2019 No Vision 14 Intra-database

An extensive review on spectral imaging in biometric systems: Challenges and advancements [37] 2019 Yes Multi-modality 6 Inter-database

Recent advances in face presentation attack detection [38] 2019 Yes Multi-modality 12 Inter-database

Deep convolutional neural networks for face and iris presentation attack detection: survey and case study [39] 2020 Yes Vision 8 Intra-databaseInter-database

A survey on anti-spoofing methods for facial recognition with RGB cameras of generic consumer devices [40] 2020 Yes Multi-modality 12 Intra-databaseInter-database

A survey on 3D mask presentation attack detection and countermeasures [41] 2020 Yes Vision 10 Intra-databaseInter-database

Face presentation attack detection in mobile scenarios: A comprehensive evaluation [42] 2020 Yes Vision 3 Intra-database

Cross-ethnicity face anti-spoofing recognition challenge: A review [43] 2020 Yes Multi-modality 1 Intra-database

Deep learning for face anti-spoofing: A survey [44] 2021 Yes Multi-modality 35 Intra-databaseInter-database

A review of state-of-the-art in face presentation attack detection: From early development to advanced deep learning and multi-modal fusion methods [45] 2021 Yes Multi-modality 47 Intra-database

Face image quality assessment: A literature survey [46] 2022 Yes Multi-modality 34 Intra-database

${}^{}$ Inter-database variation refers to bias on expression, head pose, etc. inside one certain dataset, while intra-database (cross-database) variation refers to different bias across different datasets [47].

With the dominance of deep learning models, evaluation protocols are an important component of learning and a source of improved accuracy. The two traditional protocols, inter-database and inter-database (or cross-database), have been extensively explored in previous surveys (see Table 1). However, the PAD methods have the uncertain gaps between training and testing conditions. This means that, in practice, the trained models can be used in several specific domains and must be robust to various types of attacks. Recently, two additional protocols, called unseen domain generalization [48, 49, 50, 51, 52, 53] and unknown PAD [54, 55, 56, 57, 58], have been actively researched.

The major contributions of this study can be summarized as follows:

•
Extended taxonomy of presentation attacks is introduced.
•
Lambertian modeling for PAD supports future research within a sensor-based approach.
•
Due to the fact that deep learning has dramatically improved the SOTA performance for many computer vision tasks, this survey examines the evolution and research opportunities of deep learning and hybrid (hand-crafted $+$ deep) methods for both single- and multi-modal PAD.
•
A comparison of over than 40 public databases classified by various types of presentation attacks is presented.
•
Perspectives on learning are assessed over two main practical protocols (i.e., intra-database and inter-database. The SOTA methods are considered with various application scenarios (e.g., unseen domain generalization and unknown attack detection).

The structure of this survey is as follows. Section 2 introduces the research background, including the taxonomy of presentation attacks and Lambertian modeling. Section 3 addresses SOTA for both digital (feature-based) and physical (sensor-based) PAD methods. A generalization of deep learning methods is discussed in Section 4. Section 5 presents more than 40 databases with normal and attacked images and videos. Performance evaluation metrics and testing protocols are considered in Section 6. Section 7 includes a discussion about trends and perspectives for the PAD field. Finally, conclusions are given in Section 8.
2. Background

Title and reference	Year	Deep learning	Modality (vision/ multi-modality)	Databases	Testing protocol ${}^{*}$
Biometric anti-spoofing methods: A survey in face recognition [30]	2014	No	Multi-modality	6	Intra-database
Face spoofing and counter-spoofing: A survey of state-of-the-art algorithms [31]	2017	No	Vision	6	Intra-database
Presentation attack detection methods for face recognition systems: A comprehensive survey [32]	2017	No	Vision	11	Inter-database
Deeply vulnerable: A study of the robustness of face recognition to presentation attacks [33]	2017	Yes	Vision	4	Inter-database
How far did we get in face spoofing detection? [34]	2018	Yes	Vision	13	Inter-database
A survey of mobile face biometrics [35]	2018	Yes	Vision	6	Inter-database
Insight on face liveness detection: A systematic literature review [36]	2019	No	Vision	14	Intra-database
An extensive review on spectral imaging in biometric systems: Challenges and advancements [37]	2019	Yes	Multi-modality	6	Inter-database
Recent advances in face presentation attack detection [38]	2019	Yes	Multi-modality	12	Inter-database
Deep convolutional neural networks for face and iris presentation attack detection: survey and case study [39]	2020	Yes	Vision	8	Intra-databaseInter-database
A survey on anti-spoofing methods for facial recognition with RGB cameras of generic consumer devices [40]	2020	Yes	Multi-modality	12	Intra-databaseInter-database
A survey on 3D mask presentation attack detection and countermeasures [41]	2020	Yes	Vision	10	Intra-databaseInter-database
Face presentation attack detection in mobile scenarios: A comprehensive evaluation [42]	2020	Yes	Vision	3	Intra-database
Cross-ethnicity face anti-spoofing recognition challenge: A review [43]	2020	Yes	Multi-modality	1	Intra-database
Deep learning for face anti-spoofing: A survey [44]	2021	Yes	Multi-modality	35	Intra-databaseInter-database
A review of state-of-the-art in face presentation attack detection: From early development to advanced deep learning and multi-modal fusion methods [45]	2021	Yes	Multi-modality	47	Intra-database
Face image quality assessment: A literature survey [46]	2022	Yes	Multi-modality	34	Intra-database

As mentioned in [30], the new security biometric paradigm “forget about cards and passwords, you are your own key” is very attractive to end users, but requires highly accurate techniques in image processing, computer vision and machine learning methods to improve the performance of biometric systems. However, the following implementations have demonstrated the major drawback of biometrics: “Biometric traits are not secrets”. In the last decade, intensive investigations have been focused on the study of direct or spoofing attacks. Spoofing deals with fingerprints, face, iris, voice, or even DNA previously obtained from public data. Spoofing is a purely biometric vulnerability, involving artificially produced artefacts or mimicking the behaviour of genuine individuals. Spoofing is not shared with other IT security solutions and is studies in disciplines such as machine learning, pattern recognition and image processing. Currently, the question of biometric system vulnerability is not discussed – it is accepted, but the main issue is the robustness of the biometric system to spoofing attacks and the necessary countermeasures. Hereinafter, we consider taxonomy of presentation attacks (Section 2.1) and Lambertian modeling (Section 2.2).

2.1 Taxonomy of presentation attacks

Any of the presentation attacks is aimed at creating a facial artefact. According to ISO/IEC 30107-1, the biometric characteristic or object is termed the Presentation Attack Instrument (PAI) [1]. The PAI is broadly classified into two types: artificial and human. Artificial PAI refer to an artificial means of generating the PAI, and it is classified as complete artificial PAI (print photograph, video, 3D mask, etc.) and partial artificial PAI (for example, partially visible face or sunglasses). Human PAI involve the human characteristics, such as lifeless, altered (by cosmetic surgery), non-conformant (facial expression), coerced (the use of the face of an unconscious human), and conformant (zero-effort impostor attempts).

Jain et al. [59] classified artificial attacks against biometric systems into two large groups:

1.
Zero-effort (indirect or accidental) attacks, when the biometric traits of an intruder may be sufficiently similar to a legitimately enrolled individual. Zero-effort attacks try to take advantage of the False Acceptance Rate (FAR). This situation is largely a problem of discriminative methods.
2.
Adversarial (direct or intentional) attacks, when an intruder is able to masquerade as a registered individual, using physical or digital artefacts of a legitimately registered user. Another scenario is when an individual deliberately manipulates his/her biometric trait in order to avoid detection by an automated biometric system.

Also, when supporting a biometric system, it is necessary to take into account some objective attacks: circumvention of system, repudiation in access, collusion between individuals, coercion, Denial of Service (DoS), etc.

The ISO/IEC 30107 standard introduced the term “presentation attacks” instead of direct attacks as: “Presentation of an artefact or human characteristic to the biometric capture subsystem in a fashion that could interfere with the intended policy of the biometric system” [60].

The most ancient practice to avoid being recognized is the use of masks or facial disguises, which can be associated with 3D attacks. However, artificial facial recognition systems often capture image or video that is interpreted as a 2D space rather than a 3D representation of a person. That is why relatively cheap 2D spoofing attacks are more popular compared to 3D spoofing attacks, causing a lot of attention to development of 2D PAD methods.

If an intruder tries to trick face recognition system into impersonation, such attacks are called impersonation attacks. If a person uses tricks to avoid being recognized by the system, but not necessarily by impersonating a legitimate user, then this type of presentation attacks is called an obfuscation attack. Whether the attack is impersonation or obfuscation depends on the intruder’s goal: an intruder can use any type of presentation attack.

Presentation attacks may be classified into two groups: 2D spoofs and 3D spoofs, as shown in Fig. 1. 2D spoofs include print (image) attacks, replay (video) attacks and adversarial attacks, while 3D spoofs involve mask attacks and disguises.

Figure 1.
Taxonomy of face spoofing attacks.

Print attack means representing a photograph of the genuine user to a face recognition system. Photograph may be taken from social networks, the Internet or captured by a digital camera without the consent of the genuine user. Photograph can be printed on a paper (printed copy) or is displayed on the screen of a mobile phone or a tablet (digital-photo attacks). Advanced type of image attack is the use of high resolution printed photographs with holes for the eyes and mouth (photographic masks), when the intruder can blink and move his/her lips.

Replay attacks represent a video clip of the genuine user captured by any digital device (mobile phone, tablet or laptop). This type of attacks is a more sophisticated version of the print attacks and involves richer spatio-temporal information instead of the spatial information of a still image. Objectively, replay attacks are harder to detect because of the genuine temporal information. Adversarial attacks are of a more recent type, when some wearable things become the source of the spoof. They aim to deceive deep neural networks.

Mask attacks are among to the most complex, as they provide the complete 3D structure of the face. Thus, the use of depth cues cannot be a solution to prevent print and replay attacks presented on flat surfaces. In the previous decade, this type of attacks was much less common that 2D spoofs due to expensive silicon masks. Nowadays, the situation has changed: some companies provide 3D face models at a reasonable price, and self-manufacturing a face mask is also becoming more feasible and easier thanks to relatively cheap 3D scanners, scanning software and 3D printers.

The PAD methods can be implemented at the level of sensors (hardware-based) and at the level of features (software-based). Currently, they are shifted to software-based methods. The PAD methods are often addressed by well-known facial cues [61], such as motion (blinking of eyes), involuntary movements of parts of face and head, surface texture of the skin, and the depth information of the head. Ming et al. [40] proposed a topology of facial PAD methods based on cues, including liveness cue-based methods (motion cue-based and rPPG-based), texture cue-based methods (static and dynamic), 3D geometric cue-based methods (3D shape and pseudo-depth map), multiple cues-based methods (different combinations of liveness, texture and 3D geometry), and methods using new trends (Neural Architecture Search (NAS), zero-shot learning, domain adaption, etc.). The latter methods are driven by the development of deep learning models. All groups of methods mentioned above are effective against print and replay attacks, while rPPG-based, dynamic texture-based and liveness $+$ 3D geometry (rPPG $+$ pseudo-depth map) based methods are recommended against 3D mask attacks.

In literature, 2D PAD methods are more highlighted and developed due to the large number of publicly available databases. Mask attacks began to be systematically studied only after the appearance of the first mask-specific database [62]. The COVID-19 pandemic has made an additional contribution – facial recognition using a medical mask, as well as mask attacks. It is also worth noting that a special case is the recognition of twins, which may refer to a zero-effort attack.
2.2 Lambertian modeling

Many PAD methods are based on the intuitive idea that the fake image passes twice through the camera system and once through the print system. This implies more serious distortion of the fake image relative to the real image, or in other words, a lower quality of the fake image. It should be noted that the first databases included obviously distorted fake images, and relatively simple algorithms based on hand-crafted features could distinguish them. Nowadays, the situation has changed fundamentally, and databases collect only high-quality fake images and video clips. Even printed images have good resolution thanks to modern printers.

Tan et al. [7] were the first who identified the importance of Lambertian modeling for face anti-spoofing. They obtained rough approximations of illuminance and reflectance parts. Further, in some publications, attention was paid to the mathematical formulation of the PAD task [63, 64, 65]. However, most authors prefer to deal with the consequences of the physical representation and do not take into account the Lambertian modeling. This can be explained by the fact that the difference in reflectance between real and fake images or frames can be captured by special sensors, but not by the most common RGB-cameras. Moreover, modern databases provide high-quality fake images and video clips. This has led to the failure of many algorithms with hand-crafted features and provokes the search for multiple cues in various modalities.

According to the Lambertian reflectance assumption, the face surface is modeled as the ideal diffuse reflectors [18]. In other words, Lambert’s cosine law describes the intensity of a face image $I(x,y)$ as follows:

$\displaystyle I({x,y})=f_{c}({x,y})\rho({x,y})I_{\textit{light}}\cos\theta,$ (1)

where $f_{c}(x,y)$ is the coefficient depending on the camera used at the point with coordinates ( $x, y$ ), $\rho(x,y)$ is the reflectance coefficient, which represents the diffuse reflectivity of the surface at that wavelength, $I_{\textit{light}}$ is the intensity of the incoming light at a particular wavelength, $\theta$ is the angle between the surface normal n and the incoming light ray s, $\cos\theta=\textbf{n}\cdot\textbf{s}$ .

Equation (1) can be rewritten for real image $I_{r}(x,y)$ and fake image $I_{f}(x,y)$ under assumption that $f_{c}(x,y)$ is a constant:

$\displaystyle I_{r}({x,y})=\rho_{r}({x,y})I_{\textit{light}}({{\rm{\bf n}}_{r}% \cdot{\rm{\bf s}}}),$ (2) $\displaystyle I_{f}({x,y})=\rho_{f}({x,y})I_{\textit{light}}({{\rm{\bf n}}_{f}% \cdot{\rm{\bf s}}}).$ (3)

Due to the fact that the lighting conditions during the capture of real and fake images or video sequences are unknown, various authors make additional assumptions. Thus, Tan et al. [7] supposed that the differences between the human skin and a photograph (or a video clip) can be made evident by comparing their surface properties under the same lighting conditions (s $=$ const). They estimated the surface reflectance property $\rho_{r}/\rho_{f}$ and the surface normal n ${}_{r}$ /n ${}_{f}$ for a real face and a fake image using variational Retinex-based method and Difference of Gaussian (DoG)-based method. Chan et al. [63] adopted Lambertian reflection model to extract standard deviation and mean as features. Liu et al. [66] extracted the normal cues from facial reflection frames based on the Lambertian model using the random light CAPTCHA and then trained an end-to-end multi-task CNN to perform liveness classification and light CAPTCHA regression simultaneously. Ebihara et al. [64] proposed the SpecDiff descriptor constructed by leveraging two types of reflection: specular reflections from the iris region and diffuse reflections from the entire face region. Specular reflections had a specific intensity distribution depending on liveness (with or without a flash) due to the curved human iris, which has a structure resembling glass beads. Diffuse reflections represented the 3D structure of a subject’s face against 3D masking attacks. The SpecDiff descriptor was used as a classifier in the ResNet4 deep network. Face PAD models for print and replay attacks were designed in [67]. In this work, the intensity at a given pixel ( $x, y$ ) of a given bona fide face is considered as the linear combination of luminance $I_{r}^{L}({x,y})$ and chrominance $I_{r}^{C}({x,y})$ components:

$\displaystyle I_{r}({x,y})=I_{r}^{L}({x,y})+I_{r}^{C}({x,y})=\rho_{r}({x,y})I_% {\textit{light}}+C_{r}({x,y})D_{C},$ (4)

where $C_{r}(x,∼{}y)$ presents color gamut of human skin, $D_{c}$ is the color distortion parameter of the capture device.

The color distortion analysis of the print and display (or replay) artefacts were simulated in the same manner as Eq. (4), but with the additional color distortion factors of printing and digital display conversion. Then chromatic co-occurrences of Local Binary Patterns (LBPs) were calculated based on the chrominance and luminance components of bona fide faces and facial artefacts under various luminance conditions and cameras. Such estimates were used as sub-classifiers for the ensemble learning method.

It should be noted that adversarial attacks are modeled differently, using an adversarial example perturbation by adding random noise to image (digital adversarial attack) [68] or showing special glasses, stickers, hats, etc., on the photo (objective adversarial attack) [69].

3. State-of-the art in PAD

The PAD field has a short but colorful history since the 1990s. The current common terminology has been formed by relevant standards, image processing, machine learning, and available devices. This is not to say that this process is close to completion, because the usual competition between developers and intruders continues. It is also worth noting that PAD is a challenging area of research accordance with the aspects listed below:

•
Complex outdoor illumination.
•
Aging of individuals.
•
Difficulties in classifying the liveness features of genuine and fake videos.
•
Some subjects like glasses which cause reflection, printed glasses, and stickers are a challenge for face spoof detection.
•
Make-up can be successfully used to perform direct attacks.
•
Recognition of a person after plastic surgery is still an open problem for facial authentication.
•
Recognition of twins is a very specific and unsolved problem.

Currently, PAD methods can be categorized into two unbalanced groups: digital (feature-based) and physical (sensor-based) PAD methods. The last ones require specific devices, provide other biometric features in addition to facial features and serve as cues of liveness.
3.1 Digital PAD methods

Digital PAD methods provide a wide variety of approaches – traditional, deep learning and hybrid. In this survey, we will focus on the last two approaches due to their trending superiority since the 2010s, discussing methods for print attack detection (Section 3.1.1), replay attack detection (Section 3.1.2), adversarial attack detection (Section 3.1.3), 3D mask PAD methods (Section 3.1.4), and detection of disguised, makeup, plastic surgery attacks, and morphing attacks (Section 3.1.5).

3.1.1 Print attack detection

Methods for print attack detection belong to dynamic and static approaches. The dynamic approach assumes that a still photograph is recorded as a short video, while static approach analyzes a still photograph as a single snapshot.

Printed photos were the first instrument to fool the face recognition system. The first algorithms analyzed a still photograph in temporal domain in order to confirm or reject a person’s identification. Some of these methods were based on motion detection in specific face regions, such as eye blinking, changes of mimic wrinkles near the mouth or involuntary lip movement [23]. Other methods detected face and head gestures (for example, nodding, smiling, changing gaze directions) [27]. The third group of methods analyzed foreground/background motion correlation using optical flow between the consequence still images [70, 71]. These methods are highly effective for detecting print attacks, but useful against replay attacks.

To overcome this shortcoming, some dynamic liveness detection techniques have been proposed to detect replay attacks, such as using 3D facial structure by analyzing multiple 2D images [72], context-based analysis (background and foreground motion) [73], evaluating noise produced during the recapture [74], or employing face texture dynamic analysis [75].

Dynamic-based PAD methods can be applied in systems where only a single face image of the user is available (passport related applications). Also not all frames are suitable for analysis, and real-time application is questionable. Such shortcomings limit the application of this approach in practice, but sometimes it provides better accuracy than static-based PAD methods.

Analysis of a still photograph is more attractive because it is faster overall. Most of the static methods analyze the texture of the face using various image processing algorithms, traditional and deep learning. Traditional methods have been based on the Fourier spectrum, multiple difference of Gaussian filters, LBPs, Gabor wavelets, histogram oriented gradients, and so on.

Nguyen et al. [76] proposed to use the hybrid features as a combination of handcrafted and deep features against print attacks including cut photos. Multi-level local binary patterns were used to measure the skin details on face region, such as edges, corners, and blobs by extracting a 3732-component feature vector. The VGG-19 model provided a binary classification: “real” and “presentation attack” images, forming a 4096-component feature vector. The high dimensionality of the common feature vector necessitated the use of the principal component analysis followed by the Support Vector Machine (SVM) classifier. For samples from the NUAA database and the CASIA database, the ACER values reached 11.247% and 2.174%, respectively.

However, the accuracy of these methods can suffer due to different illumination conditions, and other cues are usually used in conjunction with them.

3.1.2 Replay attack detection

Replay attack means playing a short video of an authorized user on a digital device such as mobile phone, laptop or tablet. The intruder is standing in front of the camera of face biometric system. Thus, the replay attack provides dynamic biometric features, in other words, the motion of the user. This type of attack is also possible due to the easy access to a person’s video in social networks.

Patel et al. [77] integrated deep texture features and eye-blink cue as countermeasures against replay and print presentation attacks. The CaffeNet model (with 5 convolution layers and 3 fully connected layers) was used as a backbone in the architecture with two branches, one for face texture analysis and the other for whole frame texture analysis. The generic-to-specific transfer learning scheme was designed to train the network due to the small sizes of existing face spoof databases. Such approach resulted in 13.8% of the Half Total Error Rate (HTER). Eye difference images between successive frames were calculated by cropping the regions of the left and right eyes separately to 40–50 pixels. The voting scheme was then used to generate a decision per video clip. After the eye-blink cue was fused with the deep texture features, HTER on replay attacks achieved 12.4%.

Feng et al. [78] estimated image quality cues provided by shearlet decomposition and averaged motion cues as the dense optical flows of face video and scene video to detect replay and 3D mask attacks. Then these three types of cues were fused using a hierarchical neural network to improve the generalization ability for both 2D and 3D spoofing detection. The REPLAY-ATTACK database and the 3D-MAD database were used in the experiments. With a bottleneck feature fusion, this method achieved a HTER of 0% on the 3D-MAD database.

The FASNet, which followed the VGG-16 architecture, was pre-trained with transfer learning to detect replay and mask attacks [79]. The top layers of the VGG-16 model were changed to perform the binary classification for the anti-spoofing task. The FASNet reached an HTER of 1.2% and 0.00% on the REPLAY-ATTACK database and the 3D-MAD database, respectively.

Dynamic liveness cues are often used in practice. Eye-blinking, chin and lip movement as a multiple-motion-cue-based method was developed in [80]. Bekhouche et al. [81] proposed spatiotemporal CNN with pyramid bottleneck blocks for eye blinking detection in a wild scenario. Killioğlu et al. [82] presented liveness detection method based on pupil tracking. Application of rPPG signals helps to estimate pulse or heart rate [3, 83, 84], or skin blood flow [85].

Chen et al. [86] suggested a three-way classification to distinguish real face, fake face and background for detecting the print and relay attacks. They designed face anti-spoofing region-based convolutional neural network based on Faster region-based CNN with addition of a Crystal Loss function to the original multi-task loss function. Also, the Retinex-based LBPs were presented to handle the different illumination conditions in face spoofing detection. These two detectors were further cascaded and achieved promising performances on the CASIA-FASD, REPLAY-ATTACK and OULU-NPU databases. In continuation of this direction, Peng et al. [87] proposed a two-stream vision transformers framework based on transfer learning, where the outputs of the RGB stream and the multi-scale Retinex with color restoration stream were fused by self-attention fusion module. The output of the fusion features was then classified by the softmax function as a live or spoof face in the video. Experiments on OULU-NPU, CASIA-MFSD and REPLAY-ATTACK databases have shown that this approach outperforms many methods in intra-database testing and provides good generalization performance in cross-database testing. Yu et al. [20] extended the central difference CNN to a multimodal version, exploring three modalities (RGB, depth and infrared). The CASIA-SURF CeFA, a large-scale cross-ethnicity face anti-spoofing database, was used in experiments.

Rehman et al. [88] enhanced the discriminative ability of deep features in face PAD, adding a perturbation layer (a learnable preprocessing layer for low-level deep features). In other words, the authors induced the information of the hand-crafted features into the deep features of a selected layer in CNN, called perturbation layer. Their experiments showed that the face PAD performance was improved in both intra-database and cross-database scenarios using CASIA, REPLAY-ATTACK and OULU-NPU databases.

Cai et al. [89] proposed two-branch framework based on CNN and Recurrent Neural network (RNN) to extract both global and local information, respectively, based on a single frame. For patch selection for RNN processing, they adopted deep reinforcement learning, which overperformed the max SoftMax scores (denoted as the MAX-SCORES method) and the method of selecting patches randomly (denoted as the RANDOM method). Six publicly available face presentation attack databases including CASIA, IDIAP REPLAY-ATTACK, MSU Mobile, OULU-NPU, SiW, and Rose-Youtu databases, were utilized in the experiments.

Roy et al. [90] employed a bi-directional feature pyramid network based on the EfficientDet detection architecture. In addition, they leveraged the properties of 2D Fourier spectra by adding an auxiliary branch in their baseline network. The results were tested on the OULU-NPU and REPLAY-Mobile databases. Karmakar et al. [91] proposed to detect the spontaneous facial micro-expressions by analyzing the higher frequency in multidimensional Fourier transform. This method has been applied for detecting replay attacks to distinguish between live and fake facial video streams.

Li et al. [92] detected replay and 3D mask attacks based on differences of motion direction and intensity, as well as differences of texture information between live and fake faces. The proposed architecture included optical flow and texture module, two attention modules (channel attention and region attention), and CNN network. Attention modules generated the corresponding maps, which were fed into the CNN network for feature extraction and image sequence classification. Yu et al. [93] developed a novel deep forgery detector called Patch-DFD based on facial patch mapping and bilinear interpolation with max-pooling operation. Then 5 fixed size feature maps were fused by a local voting strategy. Method has been tested against replay attacks on DeepfakeTIMIT, Celeb-DF (v2), and FaceForensics $++$ databases.

Muhammad et al. [94] attempted to generalize the detection of print attacks and replay attack by proposing a temporal sequence sampling method to accumulate appearance and dynamic information of video sequences into single RGB images. First, the input video was divided into $N$ equal video clips, and the inter-frame 2D affine motion was estimated based on the trajectories of SURF (Speeded Up Robust Features) keypoint matches. Second, the estimated inter-frame 2D affine motion was removed from the video frames, converting the resulting clips into single RGB images. The proposed self-supervised learning scheme helped in fine-tuning a 2D CNN to learn more meaningful representations from the video clips. The method has been tested on REPLAY-ATTACK, CASIA-FAS, MSU Mobile, and OULU-NPU databases. The main drawback was the unreal-time response caused by processing a two seconds video.

3.1.3 Adversarial attack detection

Szegedy et al. [95] were the first who demonstrated that deep learning models are vulnerable to adversarial (small, often quasi-imperceptible perturbations) examples, when the original samples are taken from the data distribution. Deep learning-based face recognition systems are no exception: they are vulnerable to adversarial attacks from both the digital and physical domains. Adversarial attacks have recently emerged that aim to mislead a classifier in a deep learning-based face recognition system [96]. These perturbations are physically imperceptible to human vision, generating synthetic images by adversarial methods [97]. Perturbations are utilized in two modes “no target” (hiding the identity of the user) and “dodging” (accessing the identity of the target user).

The method of 3D printing glasses to create physical attacks is a widespread adversarial attack [69, 98]. Zhou et al. [99] proposed an infrared-based stealthy facial morphing of the subject using “infrared lighting cap”. Nguyen et al. [100] developed a more convenient method to create adversarial attacks using light projections, using a commercially available webcam and a projector. The proposed method generated a digital adversary template using one or more target images available to the intruder, and this digital sample was then projected onto the opponent’s face in the physical area to either impersonate the target (impersonation) or avoid recognition (obfuscation).

Countermeasures against adversarial attacks can be roughly divided into two groups: rectification methods and methods for adversarial detection. Rectification methods increase the robustness of the system [101], while methods for adversarial detection analyze the behavior of the model, detecting anomalous events [102]. The robustness of a model can be increased via adversarial training or regularization methods that train deep learning models. Detection subsystems are often implemented as binary detectors that distinguish genuine and adversarial inputs. A recent detection method based on predictions has shown a good trade-off between detection efficiency and attack resistance [103]. It is based on a k-NN scheme or several RBF-SVMs applied to specific layers of a deep neural network, in other words, to intermediate representations. Massoli et al. [104] captured the evolution of the features extracted from the input at intermediate layers, created a trajectory in feature space, and analyzed the behavior of the original samples and the manipulated samples. A series of experiments, including a realistic scenario with a face recognition system, showed a high degree of resilience against adversarial attacks.

Deb et al. [105] proposed a new self-supervised adversarial defense framework called FaceGuard that can automatically detect, localize, and purify a wide variety of adversarial faces without using pre-computed adversarial training samples.

3.1.4 3D mask PAD methods

Most existing PAD methods proposed for 2D fake surfaces are useless for detecting 3D masks because the 3D mask usually looks more like human skin [61]. Even photographic masks with cut out areas of the eyes and mouth can deceive the recognition system.

Face mask materials can be divided into rigid/non-rigid or soft/flexible [106]. Relatively inexpensive rigid masks can be made from paper, resin or plastic. Soft masks often use latex and silicone materials. They are close to reproducing the color and texture of human skin and can adapt to different sizes, shapes and movements of the face. Currently, some mask materials, which are used in the manufacture of 3D masks, can accurately match the topology of the face, making 3D mask attack detection increasingly difficult and sometimes impossible [107]. Thus, the production of ThatsMyFace mask requires 3D reconstruction and 3D printing techniques to create a custom face mask. At that time, the RealF mask based on the three-Dimension Photo Form (3DPF) technique looks very similar as the genuine face. Advances in manufacturing 3D masks make the 3D mask attack very popular in practice.

Existing 3D mask PAD methods find the difference between real face skin and mask materials. They work at the digital level (texture based, shape based, and deep features based methods) and sensor level (the reflectance/multispectral properties based and other cues/liveness based methods, see in Section 3.2).

Texture based methods utilize various types of LBPs in conjunction with Linear Discriminant Analysis (LDA) [61], linear SVM [108], Euclidean distance classifier [109] and other similar classifiers. Agarwal et al. [110] suggested to use Haralick texture features and got good results on 2D/3D mask spoofing databases including 3DMAD database.

Shape-based methods extract discriminative features using image transformation, shape descriptors or reconstructing geometry cues. Kose et al. [111] described 3D face shape with warping parameters and extracted LBP features on 2D images and 2.5D depth maps for mask attack detection. Tang and Chen [112] used principal curvature measures and meshedSIFT-based features [113] to describe the 3D triangular mesh surface of the face. Hamdan and Mokhtar [114] applied the angular radial transformation to extract shape features from the RGB images, which were then passed to a Maximum Likelihood (ML) classifier. The same authors combined the Legendre moment invariant decomposition of the RGB image with the LDA projection, also using the ML classifier [115]. Wang et al. [116] reconstructed depth cues from RGB images using a 3D morphable model and then looked for the geometry differences between real faces and masks based on the extracted normal features.

One of the first works in deep feature based methods investigated iris, face and fingerprint modalities, trying to optimize a three-layered CNN architecture and filter optimization [117]. The 3DMAD database was one of nine databases (3 for iris, 2 for face and 4 for fingerprints) used in the experiments. This approach was too broad and inefficient, despite the high reported results. Lecena et al. [79] presented a pre-trained modified VGG-16 model (called as FAS-Net) for recognizing photo, video or mask attacks using only static features. The approach was tested on the REPLAY-ATTACK and 3DMAD public databases. On the REPLAY-ATTACK database the accuracy achieved 99.04% and HTER 1.20%. For the 3DMAD, the accuracy was of 100.00% and HTER 0.00%. Manjani et al. [118] collected a Silicone Mask Attack Database (SMAD) (see Section 3.3) and proposed a multilevel deep dictionary for the face PAD. The feature learning was independent of knowledge of attack types and did not exploit particular distinguishing attributes. The algorithm encoded the features of the presented samples from low to high level using SVM as a classifier. Experiments were performed on the SMAD, 3DMAD, CASIA-FASD, UVAD, and REPLAY-ATTACK databases, showing promising results in both intra-database and cross-database protocols.

Liu and Kumar [119] tested 5 CNN architectures and Siamese CNN on their own dataset including a total 13 different 3D face masked subjects and 9 different real subjects. Visible and Near-InfraRed (NIR) images were collected. Experimental results have shown that the NIR image can provide better performance than visible image based images. At the same time, the multispectral imaging provided better performance than using visible and NIR images separately. Shao et al. [120] developed a novel feature learning model to learn discriminative deep dynamic textures for 3D mask face anti-spoofing. The main assumption is that a real face displays different facial motion patterns compared to the 3D mask. These subtle facial motions can be captured by convolutional layers, forming multiple deep dynamic textures. The proposed joint discriminative learning strategy was further incorporated in the learning model to jointly learn the spatial- and channel-discriminability of the deep dynamic textures. Intra-database and cross-database evaluation on the 3DMAD database and their own supplementary dataset indicated the efficiency and robustness of the proposed method.

Chen et al. [121] developed a multi-modal dynamics fusion network for 3D face mask anti-spoofing. Dynamic texture and shape clues were encoded with a two-branch deep CNN model at different rates and intervals for a more comprehensive description. Various poses were also considered. This approach has been evaluated on the 3DMAD, HKBU-MARs V1 and SMAD public benchmarks. Birla and Gupta [122] developed the PATRON system, which utilized the respiration rate extracted from video and face alignment to detect 3D mask attack. The authors applied multi-kurtosis optimization to extract the respiratory signal from the resultant temporal signals. The experiments with samples from the public database HKBU-MARsV1 $+$ gave an error of 14.7%.

Furthermore, some methods tried to combine deep learning based features with hand-craft features and achieved outstanding results in mask spoofing detection [120]. This strategy can adaptively weight the learned features to make better discriminative deep dynamic textures more meaningful.

3.1.5 Detection of disguised, makeup, plastic surgery attacks, and morphing attacks

Disguised, makeup and plastic surgery are direct attacks, intentionally or unintentionally impersonate or obfuscate. Unintentional disguises include sunglasses, hats, or scarves. Disguised parts of the face can significantly interfere with recognition [123]. Disguise attacks are used in border crossing and airport security applications [124]. Easily available and cheaper makeup is similar to disguised attacks, but is harder to identify as it closely resembles the real face [125].

Facial plastic surgery is divided into two categories – reconstructive and cosmetic. Reconstructive facial plastic surgery corrects anomalies in facial features, while cosmetic facial plastic surgery improves the visual aspect of facial structures and characteristics. After undergoing cosmetic surgery, a person may attempt to fake a facial biometric system, which would be a plastic surgery attack. After plastic surgery, regions of the face, including the nose, eyes, lips, ears, or bone structure, are reshaped to obtain the desired appearance or as a result of some disease treatment surgeries [31]. In this case, biometric data must be updated in the database of the recognition system.

Facial automated border control systems at airports and seaports, which are widely deployed in many countries, compare the gate image and the biometric reference with the passenger’s electronic machine readable travel document chip, computing a similarity score [126]. The fact that the original biometric reference is not stored in a special database provokes a new type of presentation attacks called morphing attacks, which involves two or more persons. First, an intruder (e.g., a wanted fugitive) and an accomplice (e.g., a citizen) take their own face photos and create a morphed face image with two captured photos. Second, they improve the visual quality of the morphed face image by optimizing the morphing parameters and using some post-processing operations to reduce the blending artefacts. Third, they evaluate the morphed face image using publicly available face verification engines, and the accomplice applies for a passport by using the morphed image and the accomplice’s own personal information. As a result, the intruder may cross an automated border control system using an accomplice’s electronic machine readable travel document. Peng et al. [127] proposed to detect a face morphing based on the watchlist as reference. This recognition scheme is traditional and is aimed at comparing facial feature vectors from a suspect image and the biometric reference contained in the watchlist. The authors wrote that their approach has better generalization ability to unseen types of face morphing attacks and morphing parameters than existing methods. They investigated the variations in facial expressions, face angles, and image qualities using their own collected database for face morphing attacks.

3.2 Physical PAD methods

The number of PAD physical methods is incomparable with the number of software approaches. They analyze Lambertian reflections, infrared or near infrared images, depth, thermal imaging [128], detection of the facial vein pattern [129], or other biometric cues such as heartbeat, pulse, etc., which require special devices at the hardware level and data fusion at the software level. Special hardware rely on structured-light 3D sensors, Time of Flight (ToF) sensors, NIR sensors, thermal sensors, etc. 3D sensors provide depth maps that help distinguish between the 3D face and 2D planar attacks [130]. NIR sensors easily detect replay attacks due to the almost uniformly dark of electronic displays under NIR illumination [83], while thermal sensors detect the characteristic temperature distribution for living faces [131]. Virtually all special sensors have higher performance, but they have not yet been widely adopted by the public due to their high cost.

Ciftci et al. [4] introduced FakeCatcher, a fake portrait video detector based on biological signals. FakeCatcher used the biological signals hidden in portrait videos as a descriptor of authenticity. G channel-based PPG signals from different facial parts (left cheek, right cheek, and mid-region) were applied to generate signal maps and then fed into CNN inputs.

The most of physical PAD methods are used against 3D mask attacks. Reflectance/multispectral properties based detection methods were the earliest studies in 3D mask spoofing detection. Kim et al. [132] compared the distributions of albedo values for illumination of various wavelengths of different facial skins and mask materials (silicon, latex, and skin jell), classifying 2D feature vectors by Fisher’s linear discriminant. Zhang et al. [133] measured the albedo curves of skins and non-skin materials using two discriminative wavelengths (1450 nm and 850 nm) to detect fake photo, video or mask. Steiner et al. [134] integrated multispectral Short-Wave InfraRed (SWIR) skin authentication into face verification system and created a public database called BRSU Skin/Face/Spoof, which includes RGB and SWIR images.

SWIR data has not been widely used in face recognition tasks, but this technique can be very useful for detecting 3D mask attacks. In [135], it was shown that for water absorption peaks are around 1430 nm, and this behavior is particularly suitable for the detection of non-skin material. Because of this, the skin and eyes of a person in the SWIR spectrum appear very dark. Heusch et al. [136] demonstrated that a combination of different modalities (visible, NIR, thermal and depth), as well as on a SVM-based classifier acting on SWIR image differences provides superior performance. Two different models (the multi-channel CNN and the multi-channel deep pixel-wise binary supervision network) were used, corresponding to early fusion and late fusion strategies. It is worth noting that these authors have collected the new High-Quality Wide Multi-Channel Attack (HQ-WMCA) database, using five sensors for five modalities (color, NIR, SWIR, thermal, and depth).

Generally speaking, methods based on other cues of real faces for liveness detection use three types of cues: thermal signatures [131], pulse or heartbeat signals [137] and gaze information [28].

4. Generalization of deep learning methods

As mentioned Abdullakutty et al. [45], generalization of deep learning methods is a recent trend aimed at improving recognition results on those samples that the deep learning model did not see during training. The generalization methods follow a single class classification as opposed to earlier PAD models which followed a binary classification and include five types: transfer learning (Section 4.1), anomaly detection (Section 4.2), few-shot and zero-shot learning (Section 4.3), auxiliary supervision (Section 4.4), and multi-spectral methods (Section 4.5).

4.1 Transfer learning with domain generalization and domain adaptation

Domain generalization methods for PAD extract common differentiation features to improve generalization. However, it is difficult to find a compact and generalized feature space for fake faces due to the wide variety of fake face distributions from different domains. Jia et al. [51] proposed an end-to-end single-side domain generalization framework to improve the generalization ability of PAD by making the feature distribution of the real faces compact, while the feature distribution of the fake faces of different domains became more separated. Spatial and temporal auxiliary supervision was based on face depth as spatial cues and rPPG signals (pulse) as temporal cues. Autoencoders can be used to align the distributions of source domains for generalized features [138]. Shao et al. [48] exploited the shared and discriminative data across multiple PAD domains to automatically search and learn a generalized feature space. Their multi-adversarial deep domain generalization was performed under a dual-force triplet-mining constraint, but not as end-to-end solution.

Chen et al. [139] introduced a novel camera invariant face anti-spoofing model, which consists of two branches: the feature invariant branch and the feature discrimination augmentation branch. This model was evaluated on four face anti-spoofing databases: CASIA-FASD, REPLAY-ATTACK, OULU-NPU and MSU-MFSD with good intra-database and cross-database results.

Liu et al. [140] proposed a framework for extracting domain-invariant features via adversarial learning and combining low-rank decomposition methods to detect replay attacks. This approach improved the model’s generalization ability to unseen scenarios. The extensive experiments demonstrated that the proposed method achieved SOTA results on four public databases, including CASIA-MFSD, MSU-MFSD, REPLAY-ATTACK, and OULU-NPU.

Domain adaptation methods minimize the distribution discrepancy between source and target domain by using unlabeled target data [141, 142, 143]. These methods explicitly explore the relationship among multiple source domains without gaining access to any target data through unseen attacks. Most of the traditional domain adaptation methods focus on minimizing the distribution discrepancies across multiple source domains to extract domain-invariant features. Thus, Wang et al. [144] proposed an unsupervised domain adaptation with disentangled representation approach to improve the generalization capability of PAD into new scenarios. El-Din et al. [145] proposed a novel end-to-end domain adaptation-based architecture utilizing deep embedding clustering of target domain to generalize face PAD. Jia et al. [146] proposed a unified unsupervised and semi-supervised domain adaptation network for cross-scenario face anti-spoofing. Two modules were developed: one module to minimize the marginal distribution between the source and the target domains to extract domain-invariant features, and another module to make the features of the same class compact.

4.2 Anomaly detection

The PAD statement can be formulated in such a way that genuine face images are considered normal samples, while all possible attacks form the anomalous sample space. The main hypothesis is that the genuine face class has lower variance in the feature distribution and forms a close cluster. At the same time, attacks have a higher variance and can be considered as anomalies in the feature space. Any samples outside the margin of the genuine cluster are considered anomalies, e.g. the attacks. Thus, unseen attacks can be detected more accurate.

Even through many PAD methods are based on the binary classifiers, they have some disadvantages, mentioned below, with respect to the one-class classifiers approach:

•
Building a decision boundary between classes is a non-trivial task.
•
The generalization of two-class classifiers in the presence of novel or unseen attacks is low.
•
Difficulties in collecting training databases for binary classifiers, especially 3D mask databases [54].
•
Spoofing attacks are not predictable due to their nature or capture devices.
•
Client-specific solutions cannot be applied to methods with binary classifiers [147].

Nikisins et al. [148] proposed a face PAD system interpreting print and replay attacks as anomaly detection. A One-Class Classification (OCC) problem was solved using image quality measures and a Gaussian Mixture Model trained to represent the probability distribution of bona-fide samples. A challenging aggregated database composed of three publicly available databases (REPLAY-ATTACK, REPLAY-Mobile and MSU-MFSD) was introduced.

Fatemifar et al. [149] investigated the client specific models in the context of anomaly detection, as opposite to the first study of using client-specific information for spoofing detection proposed in [150]. They used four one-class anomaly detectors based on support vector data description, sparse representation based classifier, the Mahalanobis distance, and a Gaussian mixture model. Experiments involving three spoofing databases (REPLAY-ATTACK, REPLAY-Mobile, and Rose-Youtu) confirmed the merits of the client-specific anomaly detection approach. In the further study [151], new score normalisation method was proposed to normalise the output of individual outlier detectors before fusion. Li et al. [152] proposed a CNN-based method for face anti-spoofing against unseen type of attacks as a one-class classification. They developed a distance-based loss function (hypersphere loss function) and organized the training process as an end-to-end supervision. Hypersphere loss function identified the attacks directly without using an additional classifier. Feng et al. [153] proposed a framework consisting of a spoof cue generator and an auxiliary classifier. The generator minimized the spoof cues of live samples, while an auxiliary classifier served as a spoof cue amplifier to make the spoof cues more discriminative.

Anomaly-based detectors based on genuine-access data only have generalization performance advantages over binary spoofing attack detection methods. Fatemifar et al. [154, 155] leveraged the fusion of multiple anomaly classifiers using weighted averaging. Their novel three-stage optimisation method combined a hybrid optimisation method using genetic algorithm and pattern search to explore the weight space, a two-sided score normalisation method to improve the anomaly detection performance, and an ensemble pruning method to improve the generalisation performance. In addition, client-specific information was incorporated to train the proposed model. Experiments with REPLAY-ATTACK, REPLAY-Mobile and Rose-Youtu databases have demonstrated the effectiveness of SOTA anomaly-based and multiclass approaches.

Pérez-Cabo et al. [156] introduced anomaly detection strategy based on deep metric learning using just still images. For regularization, they used a triplet focal loss in the form of a “metric softmax” loss and obtained promising experimental results. Baweja et al. [157] presented another deep-learning solution for anomaly detection, in which both classifier and feature representations are learned together end-to-end. For this purpose, they introduced a pseudo-negative class, which was modeled using a Gaussian distribution followed by the application of pair-wise confusion loss to further regularize the training process. This approach was tested on four publicly available databases: REPLAY-ATTACK, Rose-Youtu, OULU-NPU, and SiW databases. Favorskaya and Pakhirka [158] have built visual maps aimed at finding specific visual anomalies caused by print and replay attacks, using a hybrid approach with creating the visual maps by traditional methods and deep learning for better generalization of the final result.
4.3 Few-shot and zero-shot learning

Few-shot learning is the process of learning from few samples with the supervised data (labeled target samples). Few-shot learning is especially useful when training requires large scale databases. When the number of labeled samples for target class is zero, few-shot learning is called zero-shot learning. Therefore, zero-shot learning is very suitable for detecting unseen or novel attacks. Few-shot learning aims at learning from very few instances from unseen categories, while zero-shot learning requires recognition unseen categories by learning only from the description or semantic information of these unseen categories.

However, currently few-shot and zero-shot learning for the PAD methods is underdeveloped. It should be noted that existing few-shot learning methods often use prior knowledge from one single modality, while zero-shot learning methods frequently explore multi-modality data [159].

Liu et al. [55] investigated the problem of zero-shot face anti-spoofing for 13 types of spoof attacks, including print, replay, 3D mask, makeup, and so on. The proposed deep tree network divided the spoof samples into the most similar spoof clusters (semantic sub-groups) without supervision and also studied the features in a hierarchical way. Each tree node consists of a convolutional residual unit, which is a block with convolutional layers and the short-cut connection. Tree routing unit defines a node routing function to route a data sample to one of the child nodes according to the largest data variation. The leaf node consists of a supervised feature learning module, which concatenates classification supervision and pixel-wise supervision to learn spoofing features.

Qin et al. [160] proposed a novel adaptive inner-update meta face anti-spoofing method to tackle a zero- and few-shot learning problem through meta-learning inspired by the model-agnostic meta-learning [161]. Meta-learner is focused on unseen attack detection by training on predefined genuine and fake facial images and providing better discrimination. First, the proposed method was trained by a meta-learner on zero- and few-shot PAD training tasks simultaneously. Second, an adaptive inner-update strategy improved the performance. The authors also proposed three zero- and few-shot PAD benchmarks and conducted experiments on both the proposed benchmarks and existing zero-shot protocols.

4.4 Auxiliary methods

The auxiliary estimation models are trained with some reliable auxiliary information, which is no longer a simple binary label. Auxiliary information has the advantage that it can be better generalized to the PAD task, and it does help to distinguish fake images from bona fide images. Depth, rPPG, reflection, noise, and disparity are the main auxiliary features.

Atoum et al. [13] were the first to consider the estimated face depth as a training signal. They suggested a two-stream CNN-based face anti-spoofing method for detecting print and replay attacks. A deep end-to-end network was used as the patch-based CNN stream, extracting scores of randomly extracted facial patches. A fully convolutional network estimated the depth of a face image, assuming that a print or replay presentation attack has a flat depth map, while live faces contain a normal face depth. The outputs of the both streams were fused for the final solution. The proposed method was evaluated using three benchmark databases: CASIA-MFSD, MSU-USSA, and REPLAY-ATTACK, achieving HTER values 2.27%, 0.21%, and 0.72%, respectively. Later these authors proposed a method for estimating the dense depth map for a live or spoof face image for mobile PAD scenarios [162]. Rehman et al. [163] studied the dynamic disparity features for depth estimation by introducing a disparity layer within CNN. This approach was tested on a collected stereo camera-based face anti-spoofing database.

Liu et al. [14] focused on the known spoof patterns across spatial and temporal domains rather than cues extraction to detect print and replay attacks. Their network combined the CNN and RNN architectures in a coherent way. The CNN part supervised for differences in depth maps and extracted feature maps in parallel. Then the depth map and the feature map were aligned with the proposed non-rigid registration layer. The RNN part was trained with the aligned maps and the rPPG supervision. The Average Classification Error Rate (ACER) in cross testing reached 10–14%.

Niu et al. [164] introduced the real-time rPPG method for continuous heart rate measurement from facial videos. The proposed multi-patch ROI strategy removed outlier signals followed by spatio-temporal analysis, replacing binary labels to supervise the CNN and RNN, respectively. Such auxiliary information is often used for reliable classification.

Lin et al. [84] proposed a generalized method exploiting both rPPG and texture features in terms of 2D and 3D mask attacks. The rPPG information in the form of multi-scale long-term statistical spectral features with variant granularities was incorporated with contextual patch-based CNN. This method has been tested to detect all three attack types using 3DMAD, HKBU-MARs V1, MSU-MFSD, CASIA-FASD, and OULU-NPU databases.

Jourabloo et al. [165] simulated noise patterns as auxiliary information by decomposing a spoof face into a spoof noise and a live face, and then using the spoof noise for classification. Such decomposition is based on the hypothesis that a spoof image can be generated by a bona fide image with multiplicative noise and additive noise. However, noise modeling is inefficient due to the lack of ground truth information. Xu et al. [166] proposed using the bona fide image of the corresponding subject as a type of ground truth in the training set. For this purpose, a metric-learned end-to-end network was developed to simulate face spoofing noise. Yang et al. [16] developed a spatio-temporal anti-spoofing network that extracted both global temporal and local spatial cues to distinguish live faces versus spoof faces.

Wang et al. [167] developed a new method to estimate depth information from multiple RGB frames, efficiently encoding spatiotemporal information for replay attack detection. Optical flow guided feature blocks take single-frame features from two consecutive frames to estimate short-term motion. The extracted features are then fed into the convolution gated recurrent units to obtain long-term motion information and output the residual of single-frame facial depth. The combined estimated multi-frame depth maps were supervised by the depth loss and binary loss. Four databases OULU-NPU, SiW, CASIA-MFSD, and REPLAY-ATTACK were used in the experiments. Intra-testing of the OULU-NPU database and the SiW database reached 9.2% and 3.1% ACER, respectively, while cross-testing between the CASIA-MFSD database and the REPLAY-ATTACK database yielded 17.5%, and 24.0% HTER, respectively.

George and Marcel [168] used pixel-wise binary supervision instead of using synthesized depth values as an auxiliary features. Furthermore, both binary and pixel-wise binary supervision was used by adding a fully connected layer on top of the pixel-wise map to protect against replay attacks. Kim et al. [169] proposed to apply a depth map and a reflection map as bipartite auxiliary supervision for PAD task. The developed bipartite auxiliary supervision network extracted and fused these auxiliary features to detect presentation attacks, in particular print and replay attacks.

Yu et al. [17] introduced a Central Difference Convolutional Network (CDCN), in which central difference convolution was able to extract the invariant detailed spoofing features, such as lattice artefacts. They proposed an extended version of CDCN consisting of the searched backbone network and multi-scale attention fusion module for aggregating the multilevel central difference convolutional features. These authors also presented a multi-modal variant using two fusion strategies for three modalities (RGB, depth and infrared) [20]. Yu et al. [19] proposed a bilateral convolutional network capturing intrinsic material-based patterns via aggregating multi-level bilateral macro- and micro-information. Thus, traditional bilateral filtering was integrated with deep network. Multilevel feature refinement module in the form of a three-head supervision combined depth (shape), reflection and patch (texture) information.

4.5 Multi-spectral methods

Most direct attacks are carried out using visible range sensors as the easiest way to fool any facial recognition system. This means that a sensor operating in different range can provide more cues for the PAD methods. Thus, the SWIR camera can be successfully used to detect 3D mask attacks [134]. As mentioned in [37], four different modalities are often used for this purpose, i.e. the visible range, SWIR, Mid-Wave InfraRed (MWIR), and Long-Wave InfraRed (LWIR). As is well known, the infrared spectrum consists of active (NIR and SWIR) and passive (also called thermal) (MWIR and LWIR) IR ranges. The NIR face images detect illumination direction changes and help in low-light conditions, while the SWIR range is suited for surveillance. Passive or thermal IR range is emission dominant, and thermal images capture thermal signatures of the skin tissue.

Nikisins et al. [170] proposed a multi-channel face PAD approach based on three types of devices. They created a stack of gray-scale, NIR, and depth facial images or a stack of images of facial features and then applied a set of pre-trained multi-channel encoders and a multi-layer perceptron for classification. For training multi-channel CNN with three modalities, they used the domain adaptation technique, transferring the knowledge of facial appearance from RGB to multi-channel domain.

Jiang et al. [171] presented a dataset of paired visible and NIR images with distance, pose, expression and session variations for print and 3D mask attacks. They also developed a multilevel CNN processing paired visible and NIR images. Multispectral data (color imagery, NIR imagery and thermal imagery) against the custom silicone mask attacks was used in [172]. Wang et al. [173] demonstrated that their end-to-end CNN with four branches with RGB, depth, IR, and fusion modal inputs, exploiting multi-modal fusion approach via spatial and channel attention overperformed other CNNs for detecting print attack. Jiang et al. [174] proposed a novel multiple categories image translation generative adversarial network that generated corresponding NIR images for visible live and spoof face images instead of using near-infrared equipment. Liveness detection using NIR and RGB images is also a popular approach to detect replay attacks [175].

George et al. [176] developed a multi-channel CNN (color, infrared, depth, and thermal channels) using transfer learning from a pre-trained face recognition network LightCNN. They also created a Wide Multi-Channel presentation Attack (WMCA) database that captures by multiple devices and channels and contains a wide variety of 2D and 3D presentation attacks. A two stream CNN that worked in RGB space and multi-scale Retinex as an illumination-invariant space has been proposed to detect print and replay attacks over both intra-database and inter-database protocols in [177].

Castelblanco et al. [178] investigated two dynamic face-authentication challenges (the camera close-up and head-rotation) to solve the task of liveness detection and face verification. A set of neural-network models based on the CNN and Siamese neural network architectures has been proposed. Their own dataset of 177 live videos recorded by 41 different subjects included multiple types of media-based attacks (printed- attacks, screen-attacks, 3D masks, videos acquired from public social media, deep fakes), for a total of 243 attacks in uncontrolled scenarios.

5. Databases

Databases have been collected since 2010, and are generally classified as 2D and 3D databases. Only a few databases contain data for one type of presentation attack (print or make-up). Thus, Tables 2 and 3 provide a summary of the existing and best-known 2D face spoofing databases (including print and replay attacks) and 3D face spoofing databases (including mask, make-up, and disguise attacks), respectively.

6. Performance evaluation metrics and testing protocols

A brief discussion of evaluation metrics and testing protocols appropriate for the PAD task is presented in Section 6.1 and Section 6.2, respectively.

6.1 Evaluation metrics

Anjos et al. [207] proposed to use Half Total Error Rate (HTER) as an evaluation metric for various PAD methods in 2011. Like any recognition system, the PAD system has two types of errors: false rejection, when the real accesses are rejected, and false acceptance, when the attacks are accepted. The HTER metric combines the False Rejection Rate (FRR) and the False Acceptance Rate (FAR) as follows:

$\displaystyle\textit{HTER}=\frac{\textit{FRR}+\textit{FAR}}{2},$ (5)

where FAR and FRR are defined as

$\displaystyle\textit{FAR}=\frac{\textit{FP}}{\textit{FP}+\textit{TN}}\quad% \textit{FRR}=\frac{\textit{FN}}{\textit{FN}+\textit{TP}}.$ (6)

Table 2

A summary of the existing 2D face spoofing databases (print, replay, mask as additional)

Database	Year	#Subject	#Sample real/spoof	Sensor	Attack	Description
NUAA [61]	2010	15	5105/7509 images	VIS	Print (flat, wrapped)	Data: 2D color images Characteristics: controlled environment
CASIA-MFSD [179]	2012	50	150/450 videos	VIS	Print (flat, wrapped, cut), replay (tablet)	Data: videos Characteristics: controlled environment, 7 scenarios and 3 image quality
REPLAY-ATTACK [180]	2012	50	200/1000 videos	VIS	Print (flat), replay (tablet, phone)	Data: videos Characteristics: controlled environment
GUC-LiFFAD [181]	2015	80	1798/3028 videos	Light field	Print (inkjet paper, laser jet paper), replay (tablet)	Data: videos Characteristics: controlled environment
MSU-MFSD [182]	2015	35	70/210 videos	VIS	Print (flat), replay (tablet, phone)	Data: videos Characteristics: controlled environment, indoor scenario, 2 types of cameras
UVAD [183]	2015	404	808/16268 videos	VIS	Replay (monitor)	Data: videos Characteristics: different lighting, background and places in two sections
MSSPOOF [184]	2016	21	1470/3024 images	VIS, NIR	Black & white print (flat)	Data: grayscale images Characteristics: controlled environment, 7 environment conditions
REPLAY-mobile [185]	2016	40	390/640 videos	VIS	Print (flat), Replay (monitor)	Data: videos Characteristics: controlled environment, 5 lighting conditions
EMSPAD [186]	2017	50	3500 multispectral, 500 VIS/1000 images	Multispectral, VIS	Inkjet print, laser print	Data: 2D images, multispectral images Characteristics: seven different spectral bands, indoor environment
OULU-NPU [187]	2017	55	720/2880 videos	VIS	Print (flat), replay (phone)	Data: videos Characteristics: lighting & background in 3 sections
Rose-Youtu [141]	2018	20	500/2850 videos	VIS	Print (flat), replay (monitor, laptop), mask (paper, crop-paper)	Data: videos Characteristics: 5 front-facing phone camera, 5 different illumination conditions
SiW [14]	2018	165	1320/3300 videos	VIS	Print (flat, wrapped), replay (phone, tablet, monitor)	Data: videos Characteristics: 4 sessions with variations of distance, pose, illumination and expression
WMCA [176]	2019	72	347/1332 videos	VIS, Depth, NIR, Thermal	Print (flat), Replay (tablet), Partial (glasses), Mask (plastic, silicone, and paper, mannequin)	Data: videos Characteristics: 6 sessions with different backgrounds and illumination, pulse data for bonafide recordings

Table 2, continued
Database	Year	#Subject	#Sample real/spoof	Sensor	Attack	Description
CIGIT-PPM [171]	2019	72	47,779/32,494 print/13,085 3D mask images	VIS, NIR	Print, 3D mask	Data: 2D color images Characteristics: session, spoofing medium, pose, expression, distance, glasses/no glasses, resolution, indoor environment
CASIA-SURF [21]	2019	1000	3000/18000 videos	VIS, Depth, NIR	Print (flat, wrapped, cut)	Data: videos $+$ 2.5D depth maps Characteristics: controlled environment, background removed, randomly cut eyes, nose or mouth areas
CASIA-SURF CeFA [188]	2019	1607	6300/27900 videos	RGB, Depth, NIR	Print, replay, 3D print mask, 3D silica gel mask	Data: videos Materials: 3D print, silica gel Characteristics: uncontrolled (indoor and outdoor) environment, 3 ethnicities, 3 modalities, 2D plus 3D attack types, 5 protocols
LCC FASD [189]	2019	243	1942/16885 images	VIS	Print	Data: 2D color images Characteristics: uncontrolled environment, 83 devices
PR-FSAD [190]	2019	30	42/480 images 84/960 videos	VIS	Print, replay	Data: 2D images Characteristics: uncontrolled environment, variations in both the distance and angle
SiW-M [55]	2019	493	1,630 videos in total	VIS	Print, replay, 3D mask, makeup	Characteristics: uncontrolled environment, zero-shot face anti-spoofing, 5 types of 3D mask attacks, 3 types of makeup attacks, and 3 partial attacks
CelebA-Spoof [191]	2020	10,177	625,537 images	VIS	Print, replay, 3D, 3 paper cut	Data: images Characteristics: uncontrolled environment, 8 scenes (2 environments * 4 illumination conditions) with more than 10 sensors
HQ-WMCA [136]	2020	51	555/2349 videos	VIS, Depth, NIR, SWIR, Thermal	Print (flat), replay (tablet, phone), mask, makeup, partial	Data: videos Materials: Laser or inkjet print, plastic, silicon, paper, mannequin, glasses, wigs, tattoo Characteristics: indoor environment, 14 modalities

Table 3

A summary of the existing 3D face spoofing databases (mask, makeup, disguise)

Database	Year	#Subject	#Sample real/spoof	Sensor	Attack	Description
YMU [192]	2012	99	99/297 images	VIS	Makeup	Data: 2D color images Characteristics: controlled environment
Morpho [61]	2013	20	207/199 images	VIS, 3D scanner	Mask	Data: 2D grayscale images $+$ 3D scans Material: hard resin Characteristics: controlled environment Type: non-public
MIW [193]	2013	125	77/77 images	VIS	Makeup	Data: 2D color images Material: light makeup, heavy makeup Characteristics: controlled environment
EURECOM MASK-ATTACK DB [111]	2013	20	200/199 images	VIS, 3D scanner	Mask	Data: 2D grayscale images $+$ 3D scans Material: hard resin Characteristics: controlled environment
3DMAD [194]	2013	17	170/85 videos	VIS, Depth RGB-D Kinect	Mask	Data: 2D color images $+$ 2.5D depth maps Material: paper, hard resin Characteristics: controlled environment
I ${}^{2}$ BVSD [123]	2013	75	150/1212 images	VIS, Thermal	Disguise	Data: 2D color images Characteristics: controlled environment, variations in hair styles, beard and mustache, glasses, cap and hat, mask
3DFS-DB [195]	2016	26	260/260 videos	VIS, 3D scanner	Mask	Data: 2D, 2.5D images $+$ 3D information Material: plastic Characteristics: indoor, varying background Type: indirect access
BRSU [134]	2016	137	404 images in total	VIS, SWIR	Mask	Data: multispectral SWIR, color images Material: silicone, plastic, hard resin, latex Characteristics:
HKBU-MARs [196]	2016	12	504/504 videos	VIS	Mask	Data: color images Material: hard resin, silicone Characteristics: varying lighting, real world scenario
SMAD [118]	2017	65	65/65 videos	VIS	Mask	Data: color images Material: silicone Characteristics: varying lighting and background Type: from online resources
MLFP [106]	2017	10	150/1200 videos	VIS, NIR, Thermal	Mask	Data: VIS, NIR, thermal images Material: latex, paper Characteristics: indoor and outdoor
ERPA [197]	2017	5	86 videos in total	VIS, NIR, Thermal, 3D scanner	Mask	Data: VIS, thermal, NIR images $+$ depth Material: resin-coated, silicone Characteristics: variety of sensors
MIFS [198]	2017	107	214/214 images	VIS	Makeup	Data: 2D color images Characteristics: controlled environment

Table 3, continued
Database	Year	#Subject	#Sample real/spoof	Sensor	Attack	Description
CS-MAD [199]	2018	14	88/160 videos 60 images	VIS, NIR, Depth, LWIR	Mask	Data: videos, high quality still color images Material: 3D silicon mask
DFW [200]	2019	1000	11,155 images in t0tal	VIS	Disguise	Data: collected from Internet, normal, validation, disguised, and impersonator images for a given subject in the range [5, 26] Characteristics: uncontrolled environment
3DMA [201]	2019	67	920 videos in total (67 genuine subjects and 48 masks)	VIS, NIR	Mask	Data: videos Characteristics: uncontrolled environment
WFFD [202]	2019	745	2300/2300 images	VIS	Mask	Data: images Materials: wax figures Characteristics: uncontrolled environment
AIM [125]	2020	72	200+/200+ videos	VIS	Makeup	Data: videos Characteristics: uncontrolled environment
CASIA-SURF 3DMask [203]	2020	48	288/864 videos	VIS	Mask	Data: videos Characteristics: mannequin with 3D print, high-quality identity-preserved, 3 decorations and 6 environments
HiFiMask [204]	2021	75	13,650/40,950 videos	VIS	Mask	Data: videos Materials: transparent, plaster, resin Characteristics: 3 mask decorations, 7 recording devices, 6 lighting conditions (periodic/random), 6 scenes
Sejong face database [205]	2021	100	100/24500 images	VIS, VIS $+$ IR, IR, Thermal	Disguise	Data: images Characteristics: 4 modalities, 12–13 additional things (cap, cap – scarf, medical mask and so on), 5 ethnicities, different directions of gaze, publicly available
CRMA [206]	2022	47	423/12,690 videos	VIS	Medical mask	Data: high-resolution videos Material: medical mask against COVID-19 Characteristics: a large diversity in capture sensors, displays, and capture scales, three experimental protocols

Here and further, TP is the true positives (the accepted real accesses), TN is the true negatives (the rejected attacks), FP is the false positives (the accepted attacks), and FN is the false negatives (the rejected real accesses).

The TP, TN, FP and FN values are calculated using model parameters based on a selected threshold achieving Equal Error Rate (EER) on the validation set, the selected threshold for which FRR $=$ FAR. The EER value can also often be used to evaluate the performance of a model in the validation and training subsets.

However, the performance is now most often reported using the metrics defined in the standard ISO/IEC JTC1 SC37 “Biometrics. Information technology – biometric presentation attack detection – Part 3: testing and reporting” [208]. This standard introduced three levels of the PAD evaluation:

•

PAD subsystem evaluation.

•

Data capture subsystem evaluation.

•

Full system evaluation.

For the PAD subsystems evaluation, two different metrics are used [1]: Attack Presentation Classification Error Rate (APCER) and Bona fide Presentation Classification Error Rate (BPCER).

The APCER for a given Presentation Attack Instrument Species (PAIS) is defined as the proportion of attack presentations using the same PAIS incorrectly classified as bona fide presentations at the PAD subsystem in a specific scenario and calculated as follows:

$\displaystyle\textit{APCER}_{\textit{PAIS}}=1-\frac{1}{N_{\textit{PAIS}}}\sum% \limits_{i=1}^{N_{\textit{PAIS}}}{\textit{RES}_{i}},$ (7)

where $N_{\textit{PAIS}}$ is the number of attack presentations for the given presentation attack instrument PAIS, RES ${}_{i}$ takes a value of 1 if the ith presentation is classified as an attack presentation and a value of 0 if classified as bona fide presentation.

The BPCER is defined as the proportion of bona fide presentations incorrectly classified as presentation attacks at the PAD subsystem in a specific scenario and calculated as follows:

$\displaystyle\textit{BPCER}=\frac{1}{N_{\textit{BF}}}\sum\limits_{i=1}^{N_{% \textit{BF}}}{\textit{RES}_{i}},$ (8)

where $N_{\textit{BF}}$ is the number of bona fide presentations.

The APCER and BPCER correspond to FAR and FRR, respectively. Similar to HTER, the Average Classification Error Rate (ACER) is defined as the mean of APCER and BPCER:

$\displaystyle\textit{ACER}=\frac{\textit{APCER}+\textit{BPCER}}{2}.$ (9)

In addition to the scalar HTER and ACER values, the Receiver Operating Characteristic (ROC) curve and the Area Under the Curve (AUC) are also commonly used to evaluate the performance of the PAD methods. The latter two have the advantage that they can provide a global evaluation of the model’s performances over different values of the parameter set. The ROC and AUC characteristics have the advantage that they provide a global estimate of the performance of the model for different values of a set of parameters.

Data capture subsystem evaluations are based on biometric sensors that may or may not include a PAD subsystem. These performance metrics include:

•

Data Capture attack presentation classification error rate (Data Capture-APCER) as the proportion of attack presentations using the same PAIS that are incorrectly classified as bona fide presentations at the data capture subsystem in a specific scenario.

•

Data Capture bona fide presentation classification error rate (Data Capture-BPCER) as the proportion of bona fide presentations that are incorrectly classified as presentation attacks at the data capture subsystem in a specific scenario.

Full system evaluations include comparison subsystem results in addition to the two previous evaluations, which can be interpreted in both verification and identification scenarios. These evaluations are as follows:

•

Verification scenario. In the case of attack samples, the performance is measured using the Impostor Attack Presentation Match Rate (IAPMR), which is defined for a full system evaluation of a verification system as the proportion of impostor attack presentations using the same PAIS in which the target reference is matched.

•

Identification scenario. In the case of attack samples, the performance is measured using Impostor Attack Presentation Identification Rate (IAPIR). The IAPIR is defined as the proportion of impostor attack presentations using the same PAIS in which the targeted reference identifier is among the identifiers returned or at least one identifier is returned by the system.

6.2 Testing protocols

PAD based on traditional methods failed to perform consistently across databases despite testing the same database under different conditions. Also the performance of the chosen method is highly dependent on the attacks and the case under considered. A combination of several complementary algorithms helps to achieve the best results. PAD based on deep learning methods have the same problems, but at different level, which has led to the term “protocols” aiming at fundamentally improving generalizability.

The existing evaluation protocols in the literature [182] include two main protocols: intra-database and inter-database (cross-database) testing protocols. However, sometimes perspectives on learning are assessed over four practical protocols (i.e., intra-database – intra-type, inter-database – intra-type, intra-database – inter-type, and inter-database – inter-type). The intra-database testing protocol splits all the data available into training and test data, while the inter-database testing protocol uses the data of one database for the system design and other databases for testing. Thus, the generalization is improved. These evaluation protocols are suitable for the binary classification task, but fail to address the one classification task. To achieve good generalization capacity, protocols for one classification task can be modified in different ways. For example, intra-database performance evaluation involved training the data associated with the two spoofing attacks plus normal data, and then testing against the third type of attack [54]. The inter-database testing scenario can also train data from two databases and tested on a third database. Variations of such inter-based testing protocols are presented in [50, 57].

A Generalization Representation over Aggregated Datasets for Generalized Presentation Attack Detection (GRAD-GPAD) framework has been proposed as a novel paradigm of data collection in [209]. These authors introduced several types of protocols, including Grandtest, Cross-Dataset, One-PAI, Unseen Attacks (Cross-PAI), and Unseen Capture Devices with two additional protocols: Cross-Face Resolution and Cross-Conditions. The leave-one-out testing protocols were formulated in [55] and applied to data from on SiW-M database.

7. Trends and perspectives

Let us summarize the SOTA of the existing PAD methods and formulate some trends and perspectives.

1. Face recognition is an integral part of biometrics after iris and fingerprint recognition. Generally speaking, face recognition systems do not require special devices and official permission to take pictures or videos, which has led to the widespread use of these systems in practice. And then there were attempts to deceive the automatic facial recognition system. Thus, the first PAD methods have been developed since the 1990s. Advances in hardware and software are making face recognition systems, 2D/3D attacks, and PAD methods more and more sophisticated. At the same time, face recognition in the wild remains a fundamental problem given the challenges of illumination, personality and existing sensors.

The PAD methods have evolved as traditional, deep learning based and hybrid, just like other branches of computer vision. The PAD methods are categorized into two unbalanced groups: digital (feature-based) and physical (sensor-based) PAD methods. Many current PAD methods aim to prevent attacks of the same type, but using multiple cues of liveness. However, an intruder can use several types of attacks, and a face recognition system must be protected from any type of attack. This means that the PAD methods should be generalized in the future.

At present, attacks are divided into 2D and 3D attacks, which lead to the development of 2D/3D PAD methods. Experimental results show that sometimes 2D methods can be effectively used to prevent 3D attacks (mask attacks). Thus, the entire family of 2D, 2.5D (2D $+$ depth) and 3D PAD methods needs to be developed with new feature descriptors, new deep learning models or fusing multimodal biometrics. Most methods for detecting 3D mask spoofing focus on the PAD performance at the sensor level and vice versa with respect to 2D spoofing detection. In any case, the possibility of using multiple sensors improves classification results.

The problem of the generalization abilities of data-driven models, which is common in the field of computer vision, remains relevant for facial PAD.

2. Databases play a significant role in training and testing deep learning models. Creating a face PAD databases with sufficient samples and variability is still very time-consuming and costly due to the fact that collecting presentation attacks and PAIs is nearly impossible in the wild compared to collecting a face database. Databases have improved their qualitative and quantitative parameters in such a way that some early PAD methods cannot provide good accuracy results when trained on modern databases. Also, some deep learning based methods achieve outstanding accuracy results in one database, reaching 0% HTER values, but fail in another database or when using different protocols. Typically, the database does not include many subjects (several dozen) and 10–100 times more samples captured in different sessions. Databases collected in 2016–2019 contain 3–5 thousand samples (images or short videos), which is not much for deep learning training. In this situation, the two ways are recommended: the first way is to use the augmentation techniques that have become more intelligent in recent years, and the second way is to collect very large databases with different types of attacks and real world variations, as we see in 2020–2022. Due to limited training data and a large gap between different types of attacks, it is still necessary to study the generalization of PAD methods in cross-database protocols.

Many databases have samples for various types of presentation attacks (multi-databases). At the same time, in other databases there are samples for an exclusive type of presentation attacks (mono-databases). Databases that store information about print and replay presentation attacks are considered general databases, while 3D mask spoofing databases are regarded as special ones, more complicated and high-cost. Information on databases containing samples with physical attacks is scarce.

Some databases, especially 3D mask spoofing databases, contain data with less realistic attacks that reduce false match rate to zero and are not promising for practice [115]. This fact indicates the need to generalize the results obtained using appropriate testing protocols. A type like “unknown attack” is especially challenging.

Another issue is that different data collection processes result in different qualities of spoofed samples. This fact is especially important for 3D mask spoofing databases, since mask production, sample recording process and training and testing protocols directly determine the quality of the data collected.

3. Deep learning has initiated progressive changes in the PAD methods because traditional methods do not take into account the intrinsic distribution relationship among different domains and extract discriminative features with bias, which leads to poor generalization to unseen domains. Various deep learning models have been proposed to detect unseen attacks. Recently, binary classification problem has been reformulated as the OCC task with the goal of accurately classifying a genuine face and considering all others as attacks. The OCC-based methods include in particular, domain generalization, anomaly detection, and zero-shot and few shot learning, which improve generalization and unseen attack detection. Generally speaking, generalization in facial PAD is one of the main trends, but in an early stage of research. It involves several techniques such as transfer learning with domain generalization and domain adaptation, anomaly detection, few-shot and zero-shot learning, auxiliary methods, and multi-spectral methods [45].

4. Recently, new types of face presentation attacks have emerged. We can mention adversarial attacks and morphing attacks. Adversarial examples are a serious threat to deep learning models because they impose significant restrictions, especially in sensitive applications. Despite the efforts of the scientific community to train robust neural networks, a well-informed intruder usually succeeds in finding ways to attack the model. Another major problem is the detection of manipulated inputs. Compared to adversarial learning, the detection of adversarial attacks has some advantages, for example, it does not require retraining of the model and the development of new learning strategies. Morphing attacks suppose that an intruder (e.g., a wanted fugitive) and an accomplice (e.g., a citizen) can deceive automated border control systems at airports and seaports, manipulating with images and the accomplice’s own personal information.

5. Early research in PAD were focused on one or two attack types (mostly print and replay attacks). However, intruders are creating new and new attacks, often joint attacks of different types. This requires more research to detect joint attacks. Thus, FaceGuard [105] proposed a generalizable defense against 6 adversarial attack types. The Diverse Fake Face Dataset (DFFD) [210] included 7 digital manipulation attack types from 4 categories. Deb et al. [211] detected 25 attack types known in literature (6 adversarial, 6 digital manipulation, and 13 spoof attacks) using a novel unified face attack detection framework called UniFAD.

8. Conclusions

This systematic survey presents taxonomy of presentation attacks in the digital and physical domains. Special attention is paid to deep learning methods for face PAD, as since 2013 technology of face recognition has shifted towards deep learning. This fact caused additional type of attacks called digital and objective adversarial attacks against deep learning models. Recent researches have led to another problem – the generalization of deep learning methods. Five types of generalization are discussed. More than 40 databases are briefly reviewed, including normal and attacked images and videos for all types of attacks. This survey also highlights metrics and testing protocols, as well as trends and perspectives in the PAD field.

References

ISO/IEC JTC1 SC37 Biometrics 2016. Available from: https://www.iso.org/committee/313770.html/.

ISO/IEC DIS 30107-3 standard. Available from: https://www.iso.org/obp/ui/#iso:std:iso-iec:30107:-1:ed-1:v1:en.

Komulainen

Zhao

Yuen

P-C

Pietikainen

. Generalized face anti-spoofing by detecting pulse from face videos. 23rd Int. Conf. Pattern Recognit. (ICPR). Cancun, Mexico: IEEE; 2016. p. 4244. doi: 10.1109/ICPR.2016.7900300.

Ciftci

Demir

Yin

. FakeCatcher: Detection of synthetic portrait videos using biological signals. IEEE Trans. Pattern Anal. Machine Intel. 2020; 32750816: 1. doi: 10.1109/TPAMI.2020.3009287.

Kim

S-H

Jeon

S-M

Lee

. Face biometric spoof detection method using a remote photoplethysmography signal. Sensors. 2022; 22: 3070. doi: 10.3390/s22083070.

Jee

H-K

Jung

S-U

Yoo

J-H

. Liveness detection for embedded face recognition system. Int. J. Computer and Information Engineering. 2008; 2(6): 2142. doi: 10.5281/zenodo.1060812.

Tan

Liu

Jiang

. Face liveness detection from a single image with sparse low rank bilinear discriminative model. In: Daniilidis K, Maragos P, Paragios N, eds. Computer Vision – ECCV 2010. Berlin, Heidelberg: Springer. LNCS, Vol. 6316. 2010. p. 504. doi: 10.1007/978-3-642-15567-3_37.

de Freitas Pereira

Anjos

De Martino

Marcel

. LBP – TOP based countermeasure against face spoofing attacks. In: Park JI, Kim J, eds. Computer Vision – ACCV 2012; Workshops. ACCV 2012. Berlin, Heidelberg: Springer. LNCS, Vol. 7728. 2013. p. 121. doi: 10.1007/978-3-642-37410-4_11.

Komulainen

Hadid

Pietikainen

. Context based face anti-spoofing. 2013 IEEE Sixth Int. Conf. Biometrics: Theory, Applications and Systems (BTAS). Arlington, VA, USA: IEEE; 2013. p. 1. doi: 10.1109/BTAS.2013.6712690.

10.

Patel

Han

Jain

. Secure face unlock: Spoof detection on smartphones. IEEE Trans. Inf. Forensics Secur. 2016; 11(10): 2268. doi: 10.1109/TIFS.2016.2578288.

11.

Zinelabidine

Jukka

Abdenour

. Face antispoofing using speeded-up robust features and Fisher vector encoding. IEEE Signal Processing Lett. 2016; 24(2): 141. doi: 10.1109/LSP.2016.2630740.

12.

Fang

. Ultra-deep neural network for face anti-spoofing. In: Liu D, Xie S, Li Y, Zhao D, El-Alfy ES, eds. Neural Information Processing. ICONIP 2017. Cham: Springer. LNCS, Vol. 10635. 2017. p. 686. doi: 10.1007/978-3-319-70096-0_70.

13.

Atoum

Liu

Jourabloo

Liu

. Face anti-spoofing using patch and depth-based CNNs. 2017 IEEE Int. Joint Conf. Biometrics (IJCB). Denver, CO, USA: IEEE. 2017. p. 319. doi: 10.1109/BTAS.2017.8272713.

14.

Liu

Jourabloo

Liu

. Learning deep models for face anti-spoofing: Binary or auxiliary supervision. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognit. Salt Lake City, UT, USA: IEEE. 2018. p. 389. doi: 10.1109/CVPR.2018.00048.

15.

Chen

Lei

Chen

Robertson

. Attention-based two-stream convolutional networks for face spoofing detection. IEEE Trans. Inf. Forensics Secur. 2019; 15(1): 578. doi: 10.1109/TIFS.2019.2922241.

16.

Yang

Luo

Bao

Gao

Gong

Zheng

Liu

. Face anti-spoofing: Model matters, so does data. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognit. (CVPR). Long Beach, CA, USA: IEEE. 2019; p. 3507. doi: 10.1109/CVPR.2019.00362.

17.

Zhao

Wang

Qin

Zhou

Zhao

. Searching central difference convolutional networks for face anti-spoofing. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognit. (CVPR). Seattle, WA, USA: IEEE. 2020. p. 5295. doi: 10.1109/CVPR42600.2020.00534.

18.

Khammari

. Robust face anti-spoofing using CNN with LBP and WLD. IET Image Proc. 2019; 13(11): 1880. doi: 10.1049/iet-ipr.2018.5560.

19.

Niu

Shi

Zhao

. Face anti-spoofing with human material perception. In: Vedaldi A, Bischof H, Brox T, Frahm JM, eds. Computer Vision – ECCV 2020. Cham: Springer. LNCS, Vol. 12352. 2020. p. 557. doi: 10.1007/978-3-030-58571-6_33.

20.

Qin

Wang

Zhao

Lei

Zhao

. Multi-modal face anti-spoofing based on central difference networks. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognit. Workshops (CVPRW). Seattle, WA, USA: IEEE. 2020; p. 2766. doi: 10.1109/CVPRW50498.2020.00333.

21.

Zhang

Liu

Wan

Liang

Guo

Escalera

Escalante

. CASIA-SURF: A large-scale multi-modal benchmark for face anti-spoofing. IEEE Trans. Biometrics, Behavior, and Identity Science. 2020; 2(2): 182. doi: 10.1109/TBIOM.2020.2973001.

22.

Pan

Sun

Lao

. Eyeblink-based anti-spoofing in face recognition from a generic webcamera. 2007 IEEE 11th Int. Conf. Computer Vision. Rio de Janeiro, Brazil: IEEE. 2007. p. 1. doi: 10.1109/ICCV.2007.4409068.

23.

J-W

. Eye blink detection based on multiple Gabor response waves. 2008 Int. Conf. Machine Learning and Cybernetics. Kunming: IEEE. Vol. 5. 2008. p. 2852. doi: 10.1109/ICMLC.2008.4620894.

24.

Wang

Ding

Fang

. Face live detection method based on physiological motion analysis. Tsinghua Science & Technology. 2009; 14(6): 685. doi: 10.1016/S1007-0214(09)70135-X.

25.

Bao

Jiang

. A liveness detection method for face recognition based on optical flow field. 2009 Int. Conf. Image Analysis and Signal Process. Linhai, China: IEEE. 2009. p. 223. doi: 10.1109/IASP.2009.5054589.

26.

Bigun

Fronthaler

Kollreider

. Assuring liveness in biometric identity authentication by real-time face tracking. 2004 IEEE Int. Conf. Computational Intelligence for Homeland Security and Personal Safety (CIHSPS 2004). Venice, Italy: IEEE. 2004. p. 1. doi: 10.1109/CIHSPS.2004.1360218.

27.

Ali

Deravi

Hoque

. Liveness detection using gaze collinearity. 2012 Third Int. Conf. Emerging Security Technologies. Lisbon, Portugal: IEEE. 2012. p. 62. doi: 10.1109/EST.2012.12.

28.

Ali

Hoque

Deravi

. Gaze stability for liveness detection. Pattern Analysis and Applications. 2018; 21: 437. doi: 10.1007/s10044-016-0587-2.

29.

Sun

Xiong

Yiu

. Understanding deep face anti-spoofing: From the perspective of data. Vis. Comput. 2021; 37: 1015. doi: 10.1007/s00371-020-01849-x.

30.

Galbally

Marcel

Fierrez

. Biometric anti-spoofing methods: A survey in face recognition. IEEE Access. 2014; 2: 1530. doi: 10.1109/ACCESS.2014.2381273.

31.

Kisku

Rakshit

. Face spoofing and counter-spoofing: A survey of state-of-the-art algorithms. Trans. Machine Learning and Artificial Intell. 2017; 5(2): 31. doi: 10.14738/tmlai.52.3130.

32.

Ramachandra

Busch

. Presentation attack detection methods for face recognition systems: A comprehensive survey. ACM Computing Surveys (CSUR). 2017; 50(1): 8. doi: 10.1145/3038924.

33.

Mohammadi

Bhattacharjee

Marcel

. Deeply vulnerable: A study of the robustness of face recognition to presentation attacks. IET Biom. 2017; 7(1): 15. doi: 10.1049/iet-bmt.2017.0079.

34.

Souza

Oliveira

Pamplona

Papa

. How far did we get in face spoofing detection? Eng. Appl. Artif. Intell. 2018; 72: 368. doi: 10.1016/j.engappai.2018.04.013.

35.

Rattani

Derakhshani

. A survey of mobile face biometrics. Comput. Electr. Eng. 2018; 72: 39–52. doi: 10.1016/j.compeleceng.2018.09.005.

36.

Raheem

Ahmad

SMS

Adnan

WAW

. Insight on face liveness detection: A systematic literature review. Int. J. Electrical and Computer Engineering. 2019; 9(6): 5865. doi: 10.11591/ijece.v9i6.pp5165-5175.

37.

Munir

Khan

. An extensive review on spectral imaging in biometric systems: Challenges & advancements. J. Vis. Commun. Image Represent. 2019; 65: 102660. doi: 10.1016/j.jvcir.2019.102660.

38.

Bhattacharjee

Mohammadi

Anjos

Marcel

. Recent advances in face presentation attack detection. In: Marcel S, Nixon M, Fierrez J, Evans N, eds. Handbook of Biometric Anti-Spoofing: Presentation Attack Detection. 2nd edition. Cham: Springer. ACVPR. 2019. p. 207. doi: 10.1007/978-3-319-92627-8_10.

39.

El-Din

Moustafa

Mahdi

. Deep convolutional neural networks for face and iris presentation attack detection: Survey and case study. IET Biometrics. 2020; 9(5): 179. doi: 10.1049/iet-bmt.2020.0004.

40.

Ming

Visani

Luqman

Burie

J-C

. A survey on anti-spoofing methods for facial recognition with RGB cameras of generic consumer devices. J. Imaging. 2020; 6(12): 139. doi: 10.3390/jimaging6120139.

41.

Jia

Guo

. A survey on 3D mask presentation attack detection and countermeasures. Pattern Recognit. 2020; 98: 107032. doi: 10.1016/j.patcog.2019.107032.

42.

Jia

Guo

Wang

. Face presentation attack detection in mobile scenarios: A comprehensive evaluation. Image and Vision Computing. 2020; 93: 103826. doi: 10.1016/j.imavis.2019.11.004.

43.

Liu

Wan

Liang

Escalera

Escalante

Madadi

Jin

Tan

Yuan

. Cross-ethnicity face anti-spoofing recognition challenge: A review. IET Biometrics. 2020; 10(1): 24. doi: 10.1049/bme2.12002.

44.

, IEEE Qin

Zhao

Lei

Zhao

. Deep learning for face anti-spoofing: A Survey. CoRR arXiv preprint, arXiv:2106.14948v1. 2021.

45.

Abdullakutty

Elyan

Johnston

. A review of state-of-the-art in face presentation attack detection: From early development to advanced deep learning and multi-modal fusion methods. Information Fusion. 2021; 75: 55. doi: 10.1016/j.inffus.2021.04.015.

46.

Schlett

Rathgeb

Henniger

Galbally

Fierrez

Busch

. Face image quality assessment: A literature survey. ACM Computing Surveys. 2022; 1. doi: 10.1145/3507901.

47.

Yang

. Leveraging intra and inter-dataset variations for robust face alignment. 2017 IEEE Conf. Computer Vision and Pattern Recognit. Workshops (CVPRW). Honolulu, HI, USA: IEEE. 2017. p. 150. doi: 10.1109/CVPRW.2017.261.

48.

Shao

Lan

Yuen

. Multi-adversarial discriminative deep domain generalization for face presentation attack detection. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognit. (CVPR). Long Beach, CA, USA: IEEE. 2019. p. 10023. doi: 10.1109/CVPR.2019.01026.

49.

Shao

Lan

Yuen

. Regularized fine-grained meta face anti-spoofing. The AAAI Conf. Artificial Intelligence. New York, New York, USA: AAAI Press, Palo Alto, California USA. 2020; 34(07): 11974. doi: 10.1609/aaai.v34i07.6873.

50.

Wang

Han

Shan

Chen

. Cross-domain face presentation attack detection via multi-domain disentangled representation learning. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognit. (CVPR). Seattle, WA, USA: IEEE. 2020. p. 6678. doi: 10.1109/CVPR42600.2020.00671.

51.

Jia

Zhang

Shan

Chen

. Single-side domain generalization for face anti-spoofing. 2020 IEEE/CVF Conf. Computer Vision and Pattern Recognit. Seattle, WA, USA: IEEE. 2020. p. 8481. doi: 10.1109/CVPR42600.2020.00851.

52.

Kim

Lee

. Domain Generalization with Pseudo-Domain Label for Face Anti-spoofing. In: Wallraven C, Liu Q, Nagahara H, eds. Asian Conference on Pattern Recognition: Pattern Recognition (ACPR 2021). Cham: Springer. LNCS, Vol. 13188. p. 431. doi: 10.1007/978-3-031-02375-0_32.

53.

Zhou

Zhang

Yao

Ding

. Adaptive mixture of experts learning for generalizable face anti-spoofing. CoRR arXiv preprint, arXiv:2207.09868v1; 2022.

54.

Arashloo

Kittler

Christmas

. An anomaly detection approach to face spoofing detection: A new formulation and evaluation protocol. IEEE Access. 2017; 5: 13868. doi: 10.1109/ACCESS.2017.2729161.

55.

Liu

Stehouwer

Jourabloo

Liu

. Deep tree learning for zero-shot face anti-spoofing. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognit. (CVPR). Long Beach, CA, USA: IEEE. 2019. p. 14680. doi: 10.1109/CVPR.2019.00481.

56.

Qin

Zhao

Zhu

Wang

Zhou

Shi

Lei

. Learning meta model for zero-and few-shot face anti-spoofing. The AAAI Conference on Artificial Intelligence. New York, New York, USA: AAAI Press, Palo Alto, California USA. 2020; 34(07): 11916. doi: 10.1609/aaai.v34i07.6866.

57.

Qin

Zhang

Shi

Wang

Yan

. One-class adaptation face anti-spoofing with loss function search. Neurocomputing. 2020; 417: 384. doi: 10.1016/j.neucom.2020.08.068.

58.

Arashloo

. Unknown face presentation attack detection via localised learning of multiple kernels. CoRR arXiv preprint, arXiv:2204.10675v1. 2022. 1–14.

59.

Jain

Ross

Pankanti

. Biometrics: A tool for information security. IEEE Trans. Inf. Forensics Security. 2006; 1(1): 125. doi: 10.1109/TIFS.2006.873653.

60.

Information Technology – Biometrics – Presentation Attack Detection – Part 1: Framework, ISO/IEC Standard ISO/IEC CD 30107-1, 2016. Available from: https://www.iso.org/standard/53227.html.

61.

Tan

Liu

Jiang

. Face liveness detection from a single image with sparse low rank bilinear discriminative model. In: Daniilidis K, Maragos P, Paragios N, eds. Computer Vision – ECCV 2010. ECCV 2010. Berlin, Heidelberg: Springer. LNCS, Vol. 6316. 2010. p. 504. doi: 10.1007/978-3-642-15567-3_37.

62.

Erdogmus

Marcel

. Spoofing face recognition with 3D masks. IEEE Trans. Inf. Forensics Security. 2014; 9(7): 1084. doi: 10.1109/TIFS.2014.2322255.

63.

Chan

Liu

Chen

Yeung

Zhang

Wang

Hsu

. Face liveness detection using a flash against 2D spoofing attack. IEEE Trans. Inf. Forensics Secur. 2018; 13(2): 521. doi: 10.1109/TIFS.2017.2758748.

64.

Ebihara

Sakurai

Imaoka

. Specular- and diffuse-reflection-based face spoofing detection for mobile devices. 2020 IEEE Int. Joint Conf. Biometrics (IJCB). Houston, TX, USA: IEEE. 2020. p. 1. doi: 10.1109/IJCB48548.2020.930486.

65.

Peng

Qin

Long

. Face presentation attack detection based on chromatic co-occurrence of local binary pattern (LBP) and ensemble learning. J. Vis. Commun. Image R. 2020; 66: 102746. doi: 10.1016/j.jvcir.2019.102746.

66.

Liu

Tai

Ding

Wang

Huang

. Aurora Guard: Real-time face anti-spoofing via light reflection. CoRR arXiv preprint, arXiv:1902.10311v1; 2019.

67.

Peng

Qin

Long

. Face presentation attack detection based on chromatic co-occurrence of local binary pattern and ensemble learning. J. Vis. Commun. Image R. 2020; 66: 102746. doi: 10.1016/j.jvcir.2019.102746.

68.

liu . A novel face presentation attack detection scheme based on multi-regional convolutional neural networks. Pattern Recognit. Lett. 2020; 131: 261. doi: 10.1016/j.patrec.2020.01.002.

69.

Sharif

Bhagavatula

Bauer

Reiter

. A general framework for adversarial examples with objectives. ACM Trans. Priv. Secur. 2019; 22(3): 1. doi: 10.1145/3317611.

70.

Anjos

Chakka

Marcel

. Motion-based countermeasures to photo attacks in face recognition. IET Biometrics. 2014; 3(3): 147. doi: 10.1049/iet-bmt.2012.0071.

71.

Edmunds

Caplier

. Motion-based countermeasure against photo and video spoofing attacks in face recognition. J. Vis. Commun. Image R. 2018; 50: 314. doi: 10.1016/j.jvcir.2017.12.004.

72.

de Marsico

Nappi

Riccio

Dugelay

. Moving face spoofing detection via 3D projective invariants. 2012 5th IAPR Int. Conf. Biometrics (ICB). New Delhi, India: IEEE. 2012. p. 73. doi: 10.1109/ICB.2012.6199761.

73.

Pan

Sun

Wang

. Monocular camera-based face liveness detection by combining eyeblink and scene context. Telecommun. Syst. 2011; 47(3–4): 215. doi: 10.1007/s11235-010-9313-3.

74.

da Silva Pinto

Pedrini

Schwartz

Rocha

. Video-based face spoofing detection through visual rhythm analysis. 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images 25th Conf. Graph., Patterns Images (SIBGRAPI). Ouro Preto, Brazil: IEEE. 2012. p. 221. doi: 10.1109/SIBGRAPI.2012.38.

75.

Komulainen

Hadid

Pietikäinen

. Face spoofing detection using dynamic texture. In: Park JI, Kim J, eds. Computer Vision – ACCV 2012; Workshops. ACCV 2012. Berlin, Heidelberg: Springer. LNCS, Vol. 7728. 2012. p. 146. doi: 10.1007/978-3-642-37410-4_13.

76.

Nguyen

Pham

Baek

Park

. Combining deep and handcrafted image features for presentation attack detection in face recognition systems using visible-light camera sensors. Sensors. 2018; 18: 699. doi: 10.3390/s18030699.

77.

Patel

Han

Jain

. Cross-database face antispoofing with robust feature representation. In: You Z, Zhou J, Wang Y, Sun Z, Shan S, Zheng W, Feng J, Zhao Q, eds. Biometric Recognition. CCBR 2016; Cham: Springer. LNCS, Vol. 9967. 2016. p. 611. doi: 10.1007/978-3-319-46654-5_67.

78.

Feng

L-M

Yuan

Cheung

TC-H

Cheung

K-W

. Integration of image quality and motion cues for face anti-spoofing: A neural network approach. J. Vis. Commun. Image R. 2016; 38: 451. doi: 10.1016/j.jvcir.2016.03.019.

79.

Lucena

Junior

Moia

Souza

Valle

Lotufo

. Transfer learning using convolutional neural networks for face anti-spoofing. In: Karray F, Campilho A, Cheriet F, eds. Image Analysis and Recognition. ICIAR 2017; Cham: Springer. LNCS, Vol. 10317. 2017. p. 27. doi: 10.1007/978-3-319-59876-5_4.

80.

Singh

Arora

. A novel face liveness detection algorithm with multiple liveness indicators. Wirel. Pers. Commun. 2018; 100(4): 1677. doi: 10.1007/s11277-018-5661-1.

81.

Bekhouche

Kajo

Ruichek

Dornaika

. Spatiotemporal CNN with pyramid bottleneck blocks: Application to eye blinking detection. Neural Networks. 2022; 152: 150. doi: 10.1016/j.neunet.2022.04.010.

82.

Killioğlu

Takiran

Kahraman

. Anti-spoofing in face recognition with liveness detection using pupil tracking. 2017 IEEE 15th Int. Symp. Applied Machine Intelligence and Informatics (SAMI). Herlany, Slovakia: IEEE. 2017. p. 87. doi: 10.1109/SAMI.2017.7880281.

83.

Hernandez-Ortega

Fierrez

Morales

Tome

. Time analysis of pulse-based face anti-spoofing in visible and NIR. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Salt Lake City, UT, USA: IEEE. 2018. p. 657. doi: 10.1109/CVPRW.2018.00096.

84.

Lin

Zhao

. Face liveness detection by rPPG features and contextual patch-based CNN. 2019 3rd Int. Conf. Biometric Engineering and Applications (ICBEA 2019). Stockholm Sweden: ACM. 2019. p. 61. doi: 10.1145/3345336.3345345.

85.

Wang

S-Y

Yang

S-H

Chen

Y-P

Huang

J-W

. Face liveness detection based on skin blood flow analysis. Symmetry. 2017; 9(12): 305. doi: 10.3390/sym9120305.

86.

Chen

Tian

Jiang

. A cascade face spoofing detector based on face anti-spoofing R-CNN and improved Retinex LBP. IEEE Access. 2019; 7: 170116. doi: 10.1109/ACCESS.2019.2955383.

87.

Peng

Meng

Long

. Presentation attack detection based on two-stream vision transformers with self-attention fusion. J. Vis. Commun. Image R. 2022; 85: 103518. doi: 10.1016/j.jvcir.2022.103518/.

88.

Rehman

YAU

Komulainen

. Enhancing deep discriminative feature maps via perturbation for face presentation attack detection. Image and Vision Computing. 2020; 94: 103858. doi: 10.1016/j.imavis.2019.103858.

89.

Cai

Wang

Chen

Kot

. DRL-FAS: A novel framework based on deep reinforcement learning for face anti-spoofing. IEEE Trans. Inf. Forensics Secur. 2021; 16: 937. doi: 10.1109/TIFS.2020.3026553.

90.

Roy

Hasan

Rupty

Hossain

Sengupta

Taus

Mohammed

. Bi-FPNFAS: Bi-directional feature pyramid network for pixel-wise face anti-spoofing by leveraging Fourier spectra. Sensors. 2021; 21: 2799. doi: 10.3390/s21082799.

91.

Karmakar

Sarkar

Datta

. Spoofed replay attack detection by multidimensional Fourier transform on facial micro-expression regions. Signal Process.: Image Commun. 2021; 93: 116164. doi: 10.1016/j.image.2021.116164.

92.

Xia

Yang

Han

. Face presentation attack detection based on optical flow and texture analysis. J. King Saud University – Computer and Information Sciences. 2022; 34: 1455. doi: 10.1016/j.jksuci.2022.02.019.

93.

Zhang

Lei

. Patch-DFD: Patch-based end-to-end DeepFake discriminator. Neurocomputing. 2022; 501: 583. doi: 10.1016/j.neucom.2022.06.013.

94.

Muhammad

Komulainen

. Self-supervised 2D face presentation attack detection via temporal sequence sampling. Pattern Recognit. Lett. 2022; 156: 15. doi: 10.1016/j.patrec.2022.03.001.

95.

Szegedy

Zaremba

Sutskever

Bruna

Erhan

Goodfellow

Fergus

. Intriguing properties of neural networks. 2nd Int. Conf. Learning Representations (ICLR 2014). Banff, AB, Canada. 2014. p. 1.

96.

Zhang

Tondi

Barni

. Adversarial examples for replay attacks against CNN-based face recognition with anti-spoofing capability. Comput. Vis. Image Underst. 2020; 197: 102988. doi: 10.1016/j.cviu.2020.102988.

97.

Yang

Zhao

Wang

. Evaluating facial recognition web services with adversarial and synthetic samples. Neurocomputing. 2020; 406: 378. doi: 10.1016/j.neucom.2019.11.117.

98.

Sharif

Bhagavatula

Bauer

Reiter

. Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS’16). Vienna, Austria: ACM. 2016. p. 1528. doi: 10.1145/2976749.2978392.

99.

Zhou

Tang

Wang

Han

Liu

Zhang

. Invisible mask: Practical attacks on face recognition with infrared. CoRR arXiv preprint, arXiv:1803.04683v1. 2018.

100.

Nguyen

D-L

Arora

Yang

. Adversarial light projection attacks on face recognition systems: A feasibility study. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Seattle, WA, USA: IEEE. 2020. p. 814. doi: 10.1109/CVPRW50498.2020.00415.

101.

Liao

Liang

Dong

Pang

Zhu

. Defense against adversarial attacks using high-level representation guided denoiser. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognit. (CVPR). Salt Lake City, UT, USA: IEEE. 2018. p. 1778. doi: 10.1109/CVPR.2018.00191.

102.

Amirian

Schwenker

Stadelmann

. Trace and detect adversarial attacks on CNNs using feature response maps. In: Pancioni L, Schwenker F, Trentin E, eds. Artificial Neural Networks in Pattern Recognition. ANNPR 2018; Cham: Springer. LNCS, Vol. 11081. 2018. p. 346. doi: 10.1007/978-3-319-99978-4_27.

103.

Sitawarin

Wagner

. On the robustness of deep K-nearest neighbors. CoRR arXiv preprint, arXiv:1903.08333v1. 2019.

104.

Massoli

Carrara

Amato

Falchi

. Detection of face recognition adversarial attacks. Computer Vision and Image Understand. 2021; 202: 103103. doi: 10.1016/j.cviu.2020.103103.

105.

Deb

Liu

Jain

. FaceGuard: A self-supervised defense against adversarial face images. CoRR arXiv preprint, arXiv:2011.14218v2. 2021.

106.

Agarwal

Yadav

Kohli

Singh

Vatsa

Noore

. Face presentation attack with latex masks in multispectral videos. 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Honolulu, HI, USA: IEEE. 2017. p. 275. doi: 10.1109/CVPRW.2017.40.

107.

Sanders

Ueda

Minemoto

Noyes

Yoshikawa

Jenkins

. Hyper-realistic face masks: A new challenge in person identification. Cogn Res Princ Implic. 2017; 2(1): 43. doi: 10.1186/s41235-017-0079-y.

108.

Raghavendra

Busch

. Robust 2D/3D face mask presentation attack detection scheme by exploring multiple features and comparison score level fusion. 17th Int. Conf. Information Fusion (FUSION). Salamanca, Spain: IEEE. 2014. p. 1.

109.

Naveen

Fathima

Moni

. Face recognition and authentication using LBP and BSIF mask detection and elimination. 2016 Int. Conf. Communication Systems and Networks (ComNet). Thiruvananthapuram, India: IEEE. 2016. p. 99. doi: 10.1109/CSN.2016.7823994.

110.

Agarwal

Singh

Vatsa

. Face anti-spoofing using Haralick features. 2016 IEEE 8th Int. Conf. Biometrics Theory, Applications and Systems (BTAS). Niagara Falls, NY, USA: IEEE. 2016. p. 1. doi: 10.1109/BTAS.2016.7791171.

111.

Kose

Dugelay

J-L

. On the vulnerability of face recognition systems to spoofing mask attacks. 2013 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP). Vancouver, BC, Canada: IEEE. 2013. p. 2357.

112.

Tang

Chen

. 3D facial geometric attributes based anti-spoofing approach against mask attacks. 2017 12th IEEE Int. Conf. Automatic Face & Gesture Recognit. (FG 2017). Washington, DC, USA: IEEE. 2017. p. 589. doi: 10.1109/FG.2017.74.

113.

Smeets

Keustermans

Vandermeulen

Suetens

. meshSIFT: Local surface features for 3D face recognition under expression variations and partial data. Computer Vision and Image Understand. 2013; 117: 158. doi: 10.1016/j.cviu.2012.10.002.

114.

Hamdan

Mokhtar

. The detection of spoofing by 3D mask in a 2D identity recognition system. Egypt. Inf. J. 2018; 19(2): 75. doi: 10.1016/j.eij.2017.10.001.

115.

Hamdan

Mokhtar

. A self-immune to 3D masks attacks face recognition system. Signal Image Video Process. 2018; 12: 1053. doi: 10.1007/s11760-018-1253-5.

116.

Wang

Chen

Huang

Wang

. Face anti-spoofing to 3D masks by combining texture and geometry features. In: Zhou J, Wang Y, Sun Z, Jia Z, Feng J, Shan S, Ubul K, Guo Z, eds. Biometric Recognition: 13th Chinese Conference on Biometric Recognition (CCBR 2018). Cham: Springer. LNCS, Vol. 10996. 2018. p. 399. doi: 10.1007/978-3-319-97909-0_43.

117.

Menotti

Chiachia

Pinto

Schwartz

Pedrini

Falcao

Rocha

. Deep representations for iris, face, and fingerprint spoofing detection. IEEE Trans. Inf. Forensics Secur. 2015; 10(4): 864. doi: 10.1109/TIFS.2015.2398817.

118.

Manjani

Tariyal

Vatsa

Singh

Majumdar

. Detecting silicone mask-based presentation attack via deep dictionary learning. IEEE Trans. Inf. Forensics Secur. 2017; 12(7): 1713. doi: 10.1109/TIFS.2017.2676720.

119.

Liu

Kumar

. Detecting presentation attacks from 3D face masks under multispectral imaging. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognit. Workshops (CVPRW). IEEE, Salt Lake City, UT, USA: IEEE. 2018. p. 47. doi: 10.1109/CVPRW.2018.00014.

120.

Shao

Lan

Yuen

. Joint discriminative learning of deep dynamic textures for 3D mask face anti-spoofing. IEEE Trans. Inf. Forensics Secur. 2019; 14(4): 923. doi: 10.1109/TIFS.2018.2868230.

121.

Chen

Yang

Huang

Wang

. 3D face mask anti-spoofing via deep fusion of dynamic texture and shape clues. 2020 15th IEEE Int. Conf. Automatic Face and Gesture Recognition (FG 2020). IEEE, Buenos Aires, Argentina: IEEE. 2020. p. 314. doi: 10.1109/FG47880.2020.00019.

122.

Birla

Gupta

. PATRON: Exploring respiratory signal derived from non-contact face videos for face anti-spoofing. Expert Systems with Applications. 2022; 187: 115883. doi: 10.1016/j.eswa.2021.115883.

123.

Dhamecha

Nigam

Singh

Vatsa

. Disguise detection and face recognition in visible and thermal spectrums. 2013 Int. Conf. Biometrics (ICB). IEEE, Madrid, Spain: IEEE. 2013. p. 1. doi: 10.1109/ICB.2013.6613019.

124.

Kohli

Yadav

Noore

. Face verification with disguise variations via deep disguise recognizer. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognit. Workshops (CVPRW). IEEE, Salt Lake City, UT, USA: IEEE. 2018. p. 17. doi: 10.1109/CVPRW.2018.00010.

125.

Kotwal

Mostaani

Marcel

. Detection of age-induced makeup attacks on face recognition systems using multi-layer deep features. IEEE Trans. Biom. Behav. Identity Sci. 2019; 2(1): 15. doi: 10.1109/TBIOM.2019.2946175.

126.

del Campo

Fernández-Isabel

de Diego

Conde

Cabello

. Dynamic facial presentation attack detection for automated border control systems. Computers & Security Comput. Secur. 2020; 92: 101744. doi: 10.1016/j.cose.2020.101744.

127.

Peng

Qin

Long

. Face morphing attack detection and attacker identification based on a watchlist. Signal Process.: Image Communic. 2022; 107: 116748. doi: 10.1016/j.image.2022.116748.

128.

Hermosilla

Ruiz-del-Solar

Verschae

Correa

. A comparative study of thermal face recognition methods in unconstrained environments. Pattern Recognit. 2012; 45(7): 2445. doi: 10.1016/j.patcog.2012.01.001.

129.

Seal

Ganguly

Bhattacharjee

Nasipuri

Basu

. Automated thermal face recognition based on minutiae extraction. Int. J. Comput. Intell. Stud. 2013; 2(2): 133. doi: 10.1504/IJCISTUDIES.2013.055220.

130.

Lagorio

Tistarelli

Cadoni

Fookes

Sridharan

. Liveness detection based on 3D face shape analysis. 2013 Int. Workshop on Biometrics and Forensics (IWBF). Lisbon, Portugal: IEEE. 2013. p. 1. doi: 10.1109/IWBF.2013.6547310.

131.

Kowalski

. A study on presentation attack detection in thermal infrared. Sensors. 2020; 20: 3988. doi: 10.3390/s20143988.

132.

Kim

Yoon

. Masked fake face detection using radiance measurements. JOSA A. 2009; 26(4): 760. doi: 10.1364/JOSAA.26.000760.

133.

Zhang

Lei

. Face liveness detection by learning multi-spectral reflectance distributions. 2011 IEEE Int. Conf. Automatic Face & Gesture Recognit. and Workshops (FG 2011). Santa Barbara, CA, USA: IEEE. 2011. p. 436. doi: 10.1109/FG.2011.5771438.

134.

Steiner

Kolb

Jung

. Reliable face anti-spoofing using multispectral SWIR imaging. 2016 Int. Conf. Biometrics (ICB). Halmstad, Sweden: IEEE. 2016. p. 1. doi: 10.1109/ICB.2016.7550052.

135.

Wilson

Nadeau

Jaworski

Tromberg

Durkin

. Review of short-wave infrared spectroscopy and imaging methods for biological tissue characterization. J. Biomedical Optics. 2015; 20(3): 030901. doi: 10.1117/1.JBO.20.3.030901.

136.

Heusch

George

Geissbuhler

Mostaani

Marcel

. Deep models and shortwave infrared information to detect face presentation attacks. IEEE Trans. Biometrics, Identity and Behavior. 2020; 2(4): 399. doi: 10.1109/TBIOM.2020.3010312.

137.

Liu

S-Q

Lan

Yuen

. Multi-channel remote photoplethysmography correspondence feature for 3D mask face presentation attack detection. IEEE Trans. Inf. Forensics Secur. 2021; 16: 2683. doi: 10.1109/TIFS.2021.3050060.

138.

Pan

Wang

Kot

. Domain generalization with adversarial feature learning. 2018 IEEE/CVF Conf. Computer Vision and Pattern Recognit. Salt Lake City, UT, USA: IEEE. 2018. p. 5400. doi: 10.1109/CVPR.2018.00566.

139.

Chen

Wnhan Yang

Wang

Kwong

. Camera invariant feature learning for generalized face anti-spoofing. IEEE Trans. Inf. Forensics Secur. 2021; 16: 2477. doi: 10.1109/TIFS.2021.3055018.

140.

Liu

Ruan

Shu

Yang

. Adversarial learning and decomposition-based domain generalization for face anti-spoofing. Pattern Recognit. Lett. 2022; 155: 171. doi: 10.1016/j.patrec.2021.10.014.

141.

Cao

Wang

Huang

Kot

. Unsupervised domain adaptation for face anti-spoofing. IEEE Trans. Inf. Forensics Secur. 2018; 13(7): 1794. doi: 10.1109/TIFS.2018.2801312.

142.

Zhang

Xie

Luo

Zhang

. Deep transfer across domains for face antispoofing. J. Electronic Imaging. 2019; 28(4): 043001. doi: 10.1117/1.JEI.28.4.043001.

143.

Wang

Han

Shan

Chen

. Improving cross-database face presentation attack detection via adversarial domain adaptation. 2019 Int. Conf. Biometrics (ICB). Crete, Greece: IEEE. 2019. pp. 1–8. doi: 10.1109/ICB45273.2019.8987254.

144.

Wang

Han

Shan

Chen

. Unsupervised adversarial domain adaptation for cross-domain face presentation attack detection. IEEE Trans. Inf. Forensics Secur. 2020; 16: 56.

145.

El-Din

Moustafa

Mahdi

. Adversarial unsupervised domain adaptation guided with deep clustering for face presentation attack detection. CoRR arXiv preprint, arXiv:2102.06864v1. 2021.

146.

Jia

Zhang

Shan

Chen

. Unified unsupervised and semi-supervised domain adaptation network for cross-scenario face anti-spoofing. Pattern Recognit. 2021; 115: 107888. doi: 10.1016/j.patcog.2021.107888.

147.

Fatemifar

Arashloo

Awais

Kittler

. Client-specific anomaly detection for face presentation attack detection. Pattern Recognit. 2021; 112: 107696. doi: 10.1016/j.patcog.2020.107696.

148.

Nikisins

Mohammadi

Anjos

Marcel

. On effectiveness of anomaly detection approaches against unseen presentation attacks in face anti-spoofing. 2018 Int. Conf. Biometrics (ICB). Gold Coast, QLD, Australia: IEEE. 2018. p. 75. doi: 10.1109/ICB2018.2018.00022.

149.

Fatemifar

Arashloo

Awais

Kittler

. Spoofing attack detection by anomaly detection. 2019 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP). Brighton, UK: IEEE. 2019. p. 8464. doi: 10.1109/ICASSP.2019.8682253.

150.

Chingovska

dos Anjos

. On the use of client identity information for face antispoofing. IEEE Trans. Inf. Forensics Secur. 2015; 10: 787. doi: 10.1109/TIFS.2015.2400392.

151.

Fatemifar

Awais

Arashloo

Kittler

. Combining multiple one-class classifiers for anomaly based face spoofing attack detection. 2019 International Conference on Biometrics (ICB). Crete, Greece: IEEE. 2019. p. 1. doi: 10.1109/ICB45273.2019.8987326.

152.

Lam

K-Y

Kot

. Unseen face presentation attack detection with hypersphere loss. 2020 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP). Barcelona, Spain: IEEE. 2020. p. 2852. doi: 10.1109/ICASSP40776.2020.9054420.

153.

Feng

Hong

Yue

Chen

Wang

Han

Liu

Ding

. Learning generalized spoof cues for face anti-spoofing. CoRR arXiv preprint, arXiv:2005.03922. 2020.

154.

Fatemifar

Asadi

Awais

Akbari

Kittler

. Face spoofing detection ensemble via multistage optimisation and pruning. Pattern Recognit. Lett. 2022; 158: 1. doi: 10.1016/j.patrec.2022.04.006.

155.

Fatemifar

Awais

Akbari

Kittler

. Developing a generic framework for anomaly detection. Pattern Recognit. 2022; 124: 108500. doi: 10.1016/j.patcog.2021.108500.

156.

Pérez-Cabo

Jiménez-Cabello

Costa-Pazo

López-Sastre

. Deep anomaly detection for generalized face anti-spoofing. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognit. Workshops (CVPRW). Long Beach, CA, USA: IEEE. 2019. p. 1. doi: 10.1109/CVPRW.2019.00201.

157.

Baweja

Oza

Perera

Patel

. Anomaly detection-based unknown face presentation attack detection. 2020 IEEE Int. Joint Conf. Biometrics (IJCB). Houston, TX, USA: IEEE. 2020. p. 1. doi: 10.1109/IJCB48548.2020.9304935.

158.

Favorskaya

Pakhirka

. Image-based anomaly detection using CNN cues generalisation in face recognition system. Int. J. Reasoning-based Intelligent Systems. 2022; 14(1): 19. doi: 10.1504/IJRIS.2022.10044691.

159.

Wang

Yao

Kwok

. Generalizing from a few examples: A survey on few-shot learning. ACM Computing Surveys ACM Comput. Surv. 2021; 53(3): 63. doi: 10.1145/3386252.

160.

Qin

Zhao

Zhu

Wang

Zhou

Shi

Lei

. Learning meta model for zero-and few-shot face antispoofing. The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20). New York, New York, USA: AAAI Press, Palo Alto, California USA. 2020. p. 11916. doi: 10.1609/aaai.v34i07.6866.

161.

Finn

Abbeel

Levine

. Model-agnostic meta-learning for fast adaptation of deep networks. 34th Int. Conf. Machine Learning (ICML’17). Sydney, Australia. PMLR. Vol. 70. 2017. p. 1126.

162.

Liu

Stehouwer

Jourabloo

Atoum

Liu

. Presentation attack detection for face in mobile phones. In: Rattani A, Derakhshani R, Ross A, eds. Selfie Biometrics. Cham: Springer. ACVPR. 2019. p. 171. doi: 10.1007/978-3-030-26972-2_8.

163.

Rehman

YAU

L-M

Liu

. SLNet: Stereo face liveness detection via dynamic disparity-maps and convolutional neural network. Expert Syst. Appl. 2020; 142: 113002. doi: 10.1016/j.eswa.2019.113002.

164.

Niu

Han

Shan

Chen

. Continuous heart rate measurement from face: A robust rPPG approach with distribution learning. 2017 IEEE International Joint Conference on Biometrics (IJCB). Denver, CO, USA: IEEE. 2017. p. 642. doi: 10.1109/BTAS.2017.8272752.

165.

Jourabloo

Liu

. Face de-spoofing: Anti-spoofing via noise modeling. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y, eds. Computer Vision – ECCV 2018; ECCV 2018. Cham: Springer. LNCS, Vol. 11217. 2018. p. 297. doi: 10.1007/978-3-030-01261-8_18.

166.

Jian

Zheng

Wang

. Identity-constrained noise modeling with metric learning for face anti-spoofing. Neurocomputing. 2021; 434: 149. doi: 10.1016/j.neucom.2020.12.095.

167.

Wang

Zhao

Qin

Zhou

Lei

. Exploiting temporal and depth information for multi-frame face anti-spoofing. CoRR arXiv preprint, arXiv:1811.05118v3. 2019.

168.

George

Marcel

. Deep pixel-wise binary supervision for face presentation attack detection. 2019 Int. Conf. Biometrics (ICB 2019). Crete, Greece: IEEE. 2019. p. 1. doi: 10.1109/ICB45273.2019.8987370.

169.

Kim

. BASN: Enriching feature representation using bipartite auxiliary supervisions for face anti-spoofing. 2019 IEEE/CVF Int. Conf. Computer Vision Workshop (ICCVW). Seoul, Korea (South): IEEE. 2020. p. 1. doi: 10.1109/ICCVW.2019.00062.

170.

Nikisins

George

Marcel

. Domain adaptation in multi-channel autoencoder based features for robust face anti-spoofing. 2019 Int. Conf. Biometrics (ICB). Crete, Greece: IEEE. 2019. p. 1. doi: 10.1109/ICB45273.2019.8987247.

171.

Jiang

Liu

Zhou

. Multilevel fusing paired visible light and near-infrared spectral images for face anti-spoofing. Pattern Recognit. Lett. 2019; 128: 30. doi: 10.1016/j.patrec.2019.08.008.

172.

Kotwal

Bhattacharjee

Marcel

. Multispectral deep embeddings as a countermeasure to custom silicone mask presentation attacks. IEEE Trans. Biom. Behav. Identity Sci. 2019; 1(4): 238. doi: 10.1109/TBIOM.2019.2939421.

173.

Wang

Lan

Han

Shan

Chen

. Multi-modal face presentation attack detection via spatial and channel attentions. 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognit. Workshops (CVPRW). Long Beach, CA, USA: IEEE. 2019. p. 1584. doi: 10.1109/CVPRW.2019.00200.

174.

Jiang

Liu

Shao

Zhou

. Face anti-spoofing with generated near-infrared images. Multimedia Tools Appl. 2020; 79: 21299. doi: 10.1007/s11042-020-08952-0.

175.

Fan

Shi

Wang

. Research on liveness detection algorithms based on deep learning. 2019 IEEE 10th Int. Conf. Software Engineering and Service Science (ICSESS). Beijing, China: IEEE. 2019. p. 1. doi: 10.1109/ICSESS47205.2019.9040795.

176.

George

Mostaani

Geissenbuhler

Nikisins

Anjos

Marcel

. Biometric face presentation attack detection with multi-channel convolutional neural network. IEEE Trans. Inf. Forensics Secur. 2020; 15: 42. doi: 10.1109/TIFS.2019.2916652.

177.

Chen

Lei

Chen

Robertson

. Attention-based two stream convolutional networks for face spoofing detection. IEEE Trans. Inf. Forensics Secur. 2019; 15: 578. doi: 10.1109/TIFS.2019.2922241.

178.

Castelblanco

Rivera

Solano

Tengana

López

Ochoa

. Dynamic face authentication systems: Deep learning verification for camera close-up and head rotation paradigms. Computers & Security. 2022; 115: 102629. doi: 10.1016/j.cose.2022.102629.

179.

Zhang

Yan

Liu

Lei

. A face antispoofing database with diverse attacks. 2012 5th IAPR Int. Conf. Biometrics (ICB). New Delhi, India: IEEE. 2012. p. 26.

180.

Chingovska

Anjos

Marcel

. On the effectiveness of local binary patterns in face anti-spoofing. 2012 BIOSIG – Int. Conf. Biometrics Special Interest Group (BIOSIG). Darmstadt, Germany: IEEE. 2012. p. 1.

181.

Raghavendra

Raja

Busch

. Presentation attack detection for face recognition using light field camera. IEEE Trans. Image Process. 2015; 24(3): 1060. doi: 10.1109/TIP.2015.2395951.

182.

Wen

Han

Jain

. Face spoof detection with image distortion analysis. IEEE Trans. Inf. Forensics Secur. 2015; 10(4): 746. doi: 10.1109/TIFS.2015.2400395.

183.

Pinto

Schwartz

Pedrini

de Rezende Rocha

. Using visual rhythms for detecting video-based facial spoof attacks. IEEE Trans. Inf. Forensics Secur. 2015; 10(4): 1025. doi: 10.1109/TIFS.2015.2395139.

184.

Chingovska

Erdogmus

Anjos

Marcel

. Face recognition systems under spoofing attacks. In: Bourlai T, ed. Face Recognition across the Imaging Spectrum. Cham: Springer. 2016. p. 165. doi: 10.1007/978-3-319-28501-6_8.

185.

Costa-Pazo

Bhattacharjee

Vazquez-Fernandez

Marcel

. The replay-mobile face presentation-attack database. 2016 Int. Conf. Biometrics Special Interest Group (BIOSIG). Darmstadt, Germany: IEEE. 2016. p. 1. doi: 10.1109/BIOSIG.2016.7736936.

186.

Raghavendra

Raja

Venkatesh

Cheikh

Busch

. On the vulnerability of extended multispectral face recognition systems towards presentation attacks. 2017 IEEE Int. Conf. Identity, Security and Behavior Analysis (ISBA). New Delhi, India: IEEE. 2017. p. 1. doi: 10.1109/ISBA.2017.7947698.

187.

Boulkenafet

Komulainen

Feng

Hadid

. Oulu-NPU: A mobile face presentation attack database with real-world variations. 2017 12th IEEE Int. Conf. Automatic Face & Gesture Recognition (FG 2017). Washington, DC, USA: IEEE. 2017. p. 612. doi: 10.1109/FG.2017.77.

188.

Liu

Tan

Wan

Escalera

Guo

. CASIA-SURF CeFA: A benchmark for multi-modal cross-ethnicity face anti-spoofing. 2021 IEEE Winter Conf. Applications of Computer Vision (WACV). Waikoloa, HI, USA: IEEE. 2021. p. 1179. doi: 10.1109/WACV48630.2021.00122.

189.

Timoshenko

Simonchik

Shutov

Zhelezneva

Grishkin

. Large crowdcollected facial anti-spoofing dataset. 2019 Computer Science and Information Technologies (CSIT). Yerevan, Armenia: IEEE. 2019. p. 123. doi: 10.1109/CSITechnol.2019.8895208.

190.

Bok

Suh

Lee

. Verifying the effectiveness of new face spoofing DB with capture angle and distance. Electronics. 2020; 9(4): 661. doi: 10.3390/electronics9040661.

191.

Zhang

Yin

Yan

Shao

Liu

. CelebA-spoof: Large-scale face anti-spoofing dataset with rich annotations. In: Vedaldi A, Bischof H, Brox T, Frahm JM, editors. Computer Vision – ECCV 2020; ECCV 2020. Cham: Springer. LNCS, Vol. 12357. 2020. p. 70. doi: 10.1007/978-3-030-58610-2_5.

192.

Dantcheva

Chen

Ross

. Can facial cosmetics affect the matching accuracy of face recognition systems? 2012 IEEE Fifth Int. Conf. Biometrics: Theory, Applications and Systems (BTAS). Arlington, VA, USA: IEEE. 2012. p. 391. doi: 10.1109/BTAS.2012.6374605.

193.

Chen

Dantcheva

Ross

. Automatic facial makeup detection with application in face recognition. 2013 Int. Conf. Biometrics (ICB). Madrid, Spain: IEEE. 2013. p. 1. doi: 10.1109/ICB.2013.6612994.

194.

Erdogmus

Marcel

. Spoofing in 2D face recognition with 3D masks and anti-spoofing with Kinect. 2013 IEEE Sixth Int. Conf. Biometrics: Theory, Applications and Systems (BTAS). Arlington, VA, USA: IEEE. 2013. p. 1. doi: 10.1109/BTAS.2013.6712688.

195.

Galbally

Satta

. Three-dimensional and two-and-a-half-dimensional face recognition spoofing using three-dimensional printed models. IET Biom. 2016; 5(2): 83. doi: 10.1049/iet-bmt.2014.0075.

196.

Liu

Yang

Yuen

Zhao

. A 3D mask face anti-spoofing database with real world variations. 2016 IEEE Conf. Computer Vision and Pattern Recognit. Workshops (CVPRW). Vegas, NV, USA: IEEE. 2016. p. 100. doi: 10.1109/CVPRW.2016.193.

197.

Bhattacharjee

Marcel

. What you can’t see can help you – extended-range imaging for 3D-mask presentation attack detection. 2017 Int. Conf. Biometrics Special Interest Group (BIOSIG). Darmstadt, Germany: IEEE. 2017. p. 1. doi: 10.23919/BIOSIG.2017.8053524.

198.

Chen

Dantcheva

Swearingen

Ross

. Spoofing faces using makeup: An investigative study. 2017 IEEE Int. Conf. Identity, Security and Behavior Analysis (ISBA). New Delhi, India: IEEE. 2017. p. 1. doi: 10.1109/ISBA.2017.7947686.

199.

Bhattacharjee

Mohammadi

Marcel

.Spoofing deep face recognition with custom silicone masks. 2018 IEEE 9th Int. Conf. Biometrics Theory, Applications and Systems (BTAS). Redondo Beach, CA, USA: IEEE. 2018. p. 1. doi: 10.1109/BTAS.2018.8698550.

200.

Singh

Vatsa

Ratha

Chellappa

. Recognizing disguised faces in the wild. IEEE Trans. Biom. Behav. Identity Sci. 2019; 1(2): 97. doi: 10.1109/TBIOM.2019.2903860.

201.

Xiao

Tang

Guo

Yang

Zhu

Lei

. 3DMA: A multimodality 3D mask face anti-spoofing database. 2019 16th IEEE Int. Conf. Advanced Video and Signal Based Surveillance (AVSS). Taipei, Taiwan: IEEE. 2019; p. 1. doi: 10.1109/AVSS.2019.8909845.

202.

Jia

. Spoofing and anti-spoofing with wax figure faces. CoRR arXiv preprint, arXiv:1910.05457v1. 2019.

203.

Wan

Qin

Zhao

. NAS-FAS: Static-dynamic central difference network search for face anti-spoofing. IEEE Trans. Pattern Anal. Machine Intell. 2021; 43(9): 3005. doi: 10.1109/TPAMI.2020.3036338.

204.

Liu

Zhao

Wan

Liu

Tan

Escalera

Xing

Liang

Guo

Lei

Zhang

. Contrastive context-aware learning for 3D high-fidelity mask face presentation attack detection. IEEE Trans. Inf. Forensics Secur. 2022; 17: 2497. doi: 10.1109/TIFS.2022.3188149.

205.

Cheema

Moon

. Sejong face database: A multi-modal disguise face database. Computer Vision and Image Understanding. 2021; 208–209: 103218. doi: 10.1016/j.cviu.2021.103218.

206.

Fang

Damer

Kirchbuchner

Kuijper

. Real masks and spoof faces: On the masked face presentation attack detection. Pattern Recognit. 2022; 123: 108398. doi: 10.1016/j.patcog.2021.108398.

207.

Anjos

Marcel

. Counter-measures to photo attacks in face recognition: A public database and a baseline. 2011 Int. Joint Conf. Biometrics (IJCB). Washington, DC, USA: IEEE. 2011. p. 1. doi: 10.1109/IJCB.2011.6117503.

208.

Biometrics. Information technology-biometric presentation attack detection – Part 3: testing and reporting, ISO/IEC JTC1 SC37. 2017. Available from: https//www.iso.org/standard/67381.html.

209.

Costa-Pazo

Jiménez-Cabello

Vázquez-Fernández

Alba-Castro

López-Sastre

. Generalized presentation attack detection: A face anti-spoofing evaluation proposal. 2019 Int. Conf. Biometrics (ICB). Crete, Greece: IEEE. 2019. p. 1. doi: 10.1109/ICB45273.2019.8987290.

210.

Dang

Liu

Stehouwer

Liu

Jain

. On the detection of digital face manipulation. IEEE Computer Vision and Pattern Recognit. (CVPR 2020). Seattle, WA, USA: IEEE. 2020. p. 1. doi: 10.1109/cvpr42600.2020.00582.

211.

Deb

Liu

Jain

. Unified detection of digital and physical face attacks. CoRR preprint, arXiv:2104.02156v1. 2021.