Abstract
Vibraimage is a digital system that quantifies a subject’s mental and emotional state by analysing video footage of the movements of their head. Vibraimage is used by police, nuclear power station operators, airport security and psychiatrists in Russia, China, Japan and South Korea, and has been deployed at two Olympic Games, a FIFA World Cup and a G7 Summit. Yet there is no reliable empirical evidence for its efficacy; indeed, many claims made about its effects seem unprovable. What exactly does vibraimage measure and how has it acquired the power to penetrate the highest profile and most sensitive security infrastructure across Russia and Asia?
I first trace the development of the emotion recognition industry, before examining attempts by vibraimage’s developers and affiliates scientifically to legitimate the technology, concluding that the disciplining power and corporate value of vibraimage are generated through its very opacity, in contrast to increasing demands across the social sciences for transparency. I propose the term ‘suspect artificial intelligence (AI)’ to describe the growing number of systems like vibraimage that algorithmically classify suspects/non-suspects, yet are themselves deeply suspect. Popularising this term may help resist such technologies’ reductivist approaches to ‘reading’—and exerting authority over—emotion, intentionality and agency.
Introduction
As I sat in the meeting room of a nondescript office building in Tokyo, the managing director of a company called ELSYS Japan discussed my emotional and psychological state, referring to a series of charts and tables displayed on a large screen at the front of the room:
Aggression … 20-50 is the normal range, but you scored 52.4 … this is a bit too high. Probably you yourself didn’t know this, but you’re a very aggressive person, potentially… Next is stress. Your stress is 29.2, within the range of 20-40, with a statistical deviation of 14—that’s OK… I think you have very good stress… Just tension—your [average] value is within the range, but because your statistical deviation is high—over 20—so you’re a little tense. Mental balance is 64 from a range of 50-100, so it fits correctly in the range… Charm … 74.6 is pretty good. Now, neuroticism is 35.3, this is also in the range, but the statistical deviation is high. But some people have a high score the first time they are measured. There are people who have high scores for neuroticism as well as for tension, yes. People who possess a delicate heart.
1
(Interview, 17 April 2019)
The director’s seemingly authoritative statements were based on an assessment of various measurements produced by ‘vibraimage’, a patented 2 system developed to quantify a subject’s mental and emotional state through an automated analysis of video footage of the physical movements of their face and head. This system, distributed in Japan by ELSYS Japan under the brands ‘Mental Checker’ and ‘Defender-X’, provides numerical values for levels of aggression, tension, balance, energy, inhibition, stress, suspiciousness, 3 charm, self-regulation, neuroticism, extroversion and stability, categorising these automatically into positive and negative ‘emotions’. Mental Checker generates an impressive array of statistical data arranged across tables, pie chart, histogram and line chart, producing an image of mathematical precision and solid scientific legitimacy (see Figure 1). The report also provides a visualisation of what ELSYS Japan terms an ‘aura’—a horizontal colour-coded bar chart, indicating the frequency of micro-vibrations of a subject’s head, superimposed against a still image of their face.

Vibraimage technology has already entered the global security marketplace. It was deployed at the 2014 Sochi Olympics (Herszenhorn, 2014), 2018 PyeongChang Winter Olympics, 2018 FIFA World Cup in Russia and at major Russian airports to detect suspect individuals among crowds (JETRO, 2019). It has been used at the Russian State Atomic Energy Corporation in experiments to monitor the professionalism of workers handling and disposing of spent nuclear fuel and radioactive waste (Bobrov et al., 2019; Shchelkanova et al., 2019), and to diagnose their psychosomatic illnesses (Novikova et al., 2019). In Japan, Mental Checker and Defender-X have been used by one of the largest technology and electronics companies, NEC, 4 to vet staff at nuclear power stations and by a leading security services firm, ALSOK, to detect and potentially deny entry to or detain suspicious individuals at major events, including the G7 Summit in 2016, as well as sporting events and theme parks (Interview with ELSYS Japan, 17 April 2019). Managers at ELSYS Japan expected that the technology would be used at the 2020 Tokyo Olympics (Nonaka 2018, p. 148, Interview with ELSYS Japan, 17 April 2019), an event that spurred significant increased spending on domestic security services and infrastructure, with estimated market growth of 18% between 2016 and 2019 (Teraoka, 2018). 5 ELSYS Japan’s customers also include Fujitsu and Toshiba, which have considered ‘incorporat[ing] [vibraimage]… into their own recognition technologies to differentiate their original products’ (Nonaka, 2018, p. 147), and managers told me that Mental Checker has been used by an unspecified number of Japanese psychiatrists to confirm diagnoses of depression.
In South Korea, the Korean National Police Agency, Seoul Metropolitan Policy Agency and several universities have collaborated on research aiming to establish the use of vibraimage in a video-based ‘contactless’ lie-detection system as an alternative to polygraph testing (Lee & Choi, 2018; Lee et al., 2018), while, in China, it has been deployed in Inner Mongolia, Zhejiang and elsewhere to identify suspects for questioning and detention, and has been officially certified for use by Chinese police (Choi et al., 2018a, 2018b). 6 Other corporate applications of vibraimage are also proposed: an ELSYS Japan brochure suggests using Mental Checker to discover how employees really feel about their company; measure their levels of stress, fatigue and ‘potential ability’; counter employees’ accusations of bullying and abuses of power in the workplace; and even ‘to know the risk of hiring persons who might commit a crime’ (ELSYS Japan Brochure, undated). The brochure provides a screenshot of a suggested employee report, with grades (A+, B−, C, etc.) for qualities that include stability, fulfilment and happiness, social skills, teamwork, communication, ability to take action, aggressiveness, stress tolerance and ability to ‘recognise reality’. 7
Vibraimage forms one part of the rapid growth in algorithmic security, surveillance, predictive policing and smart city infrastructure across urban East Asia, enabling the ‘active sorting, identification, prioritization and tracking of bodies, behaviours and characteristics of subject populations on a continuous, real-time basis’ (Graham & Wood, 2003, p. 228). Amid an international boom in both surveillance technologies and artificial intelligence (AI) systems designed to extract maximal information from digital photographic and video data relating to the body, companies are developing algorithms that move beyond facial recognition intended to identify individuals and increasingly aim to analyse their behaviour and emotional states (AI Now Institute, 2018, pp. 50–52). The digital emotion recognition industry was worth up to US$12 billion in 2018, and it continues to grow rapidly (AI Now Institute, 2018).
As the concepts of algorithmic regulation and governance (Goldstein et al., 2013; Introna, 2016) are increasingly becoming a reality, transparency has become a key theme in critiques of black-boxed algorithms and AI, including those used in emotion recognition. This is particularly the case with machine learning, in which algorithms recursively adjust themselves and can quickly become inexplicable even to data science experts. As Maclure puts it, ‘we are delegating tasks and decisions that directly affect the rights, opportunities and wellbeing of humans to opaque systems which cannot explain and justify their outcomes’ (Maclure, 2019, p. 3). Transparency is linked to and overlaps with values of comprehensibility, explicability, accountability and social justice, and it is frequently presented as a vital component of ethical or ‘good’ AI (Floridi et al., 2018; Hayes et al., 2020; Leslie, 2019).
Burrell describes three types of algorithmic opacity:
(1) opacity as intentional corporate or institutional self-protection and concealment and, along with it, the possibility for knowing deception; (2) opacity stemming from the current state of affairs where writing (and reading) code is a specialist skill and; (3) an opacity that stems from the mismatch between mathematical optimization in high-dimensionality characteristic of machine learning and the demands of human-scale reasoning and styles of semantic interpretation. (Burrell, 2016, pp. 1–2)
Danaher (2016), likewise, distinguishes opacity and hiddenness, where opacity refers to the incomprehensible or inaccessible ways systems work, and hiddenness refers to the covert manner in which data are collected and used. As I will show in this article, both of the latter definitions apply to vibraimage. Data used by the system may be collected and analysed in a covert manner (e.g., via CCTV at a public event), and its exact method of processing these data are also opaque—it is unclear how the system works or what precisely it quantifies.
This article uses the case of vibraimage to examine issues around opacity and the work it does for companies and governments in the provision of security services, by attempting to shed light on the algorithms of vibraimage and its imagined and actual uses, as far as possible based on publicly available data. What exactly does vibraimage measure and how does the data the system produces, processed through an algorithmic black box, deliver reports that have acquired the power to penetrate corporate and public security systems involved in the highest profile and most sensitive security tasks in Russia, Japan, China and elsewhere? The first section of the article examines emotion detection techniques and their digitalisation. The second section focuses on vibraimage and how its proponents, many of whom have commercial relationships with companies distributing it, have engaged in processes of scientific legitimation of the technology while making claims for its actual and potential uses. The final section considers how the disciplining power and corporate value of vibraimage are generated through its very opacity, in stark contrast to increasingly urgent demands across the social sciences and society, more broadly, for transparency as a prerequisite for ‘good AI’. I propose the term ‘suspect AI’ reflexively to describe the increasing number of algorithmic systems, such as vibraimage, in operation globally across law enforcement and security services, which automatically classify subjects as suspects or non-suspects. Popularising this term may be one way to resist such reductivist approaches to reading and exerting authority over human emotion, intentionality, behaviour and agency.
Emotion Recognition Based on Facial Expressions
Psychologist Paul Ekman pioneered research exploring the relationship between emotions and facial expressions since the 1960s, building on Darwin’s (2012[1872]) work on evolutionary connections between the two among animals, including humans. Ekman conducted experiments around the world, aiming to demonstrate the universality of a handful of basic emotions (such as anger, contempt, disgust, fear, happiness, sadness and surprise) across all cultures and societies, and of their articulation through similar facial expressions (Ekman, 1992). This work was highly influential because it seemed to provide overwhelming empirical evidence that individuals of all cultures were able to ‘correctly’ categorise the expressions of people of their own and other cultures provided in photos, matching them to the ‘basic emotions’ they supposedly expressed (Ekman & Friesen, 1971).
Ekman further argued that facial expressions could be used to identify incongruities between professed and ‘real’ emotions, enabling facial expression analysis to be used for lie detection (Ekman & Friesen, 1969). This attracted substantial interest from corporations concerned with ensuring the honesty of employees or gaining covert insights in business negotiations, and from governments and security forces concerned with identifying dissimulating and suspect individuals. Ekman and collaborators in this field like David Matsumoto formed companies, running workshops and holding consultations with corporations and public bodies about how to read subjects’ facial micro-expressions and behavioural cues to evaluate personality, truthfulness and potential danger. In 2001, the American Psychological Association named Ekman one of the most influential psychologists of the twentieth century (APA, 2002).
The identification of emotions through facial expressions underwent digitalisation via machine learning techniques pioneered since the mid-1990s by Rosalind Picard and Rana el Kaliouby at Massachusetts Institute of Technology (MIT). They commercialised this new field of ‘affective computing’ via their venture capital–backed company Affectiva, founded in 2009, which provides emotional analysis software to businesses based on algorithms trained on large databases of facial expressions (Johnson, 2019). According to Affectiva, this enables a test subject’s emotional responses to, for example, TV commercials, to be tracked in real time. With the recent boom in facial recognition technology, emotion recognition represents a rapidly expanding area of AI development, used across industries, including recruitment and marketing research (Devlin, 2020). A growing number of companies offer emotion recognition services based on analysis of facial expressions, including Microsoft (Emotion application programming Interface [API]), Amazon (Rekognition), Apple (Emotient, which Ekman advised on) and Google (Cloud Vision API).
Such systems are increasingly being used in border protection and law enforcement to identify dissimulating and otherwise suspect individuals, regardless of substantial evidence of efficacy. From 2007, the Transportation Security Administration (TSA) spent US$900 million on a ‘behaviour-detection programme’ entitled Screening Passengers by Observation Technique (SPOT), until it was ruled ineffective by the Department of Homeland Security and the Government Accountability Office (GAO, 2013). Ekman consulted on SPOT, and the system incorporated his techniques; his company also provided consulting services to US courts (Fischer, 2013). Another system—Automated Virtual Agent for Truth Assessments in Real-Time (AVATAR), was developed for lie detection targeting migrants on the USA–Mexico border (Daniels, 2018), while the EU trialled the iBorderCtrl system, supplied by the consortium European Dynamics and funded by Horizon 2020, using the interpretation of micro-expressions to detect deceit among migrants in Hungary, Greece, and Latvia (Boffey, 2018; see also AI Now Institute, 2018, pp. 50–52).
Recently, this work on facial expression analysis for emotion recognition has come under increasing scrutiny despite its ongoing popularity among many psychologists. The most basic critique is that one does not necessarily smile when one is happy—common sense suggests that facial expressions do not always, or even often, map to inner feelings, that emotions are often fleeting or momentary, and that facial expressions and their meaning are highly dependent on sociocultural context. Barrett et al. (2019) summarise these and other critiques, arguing that approaches positing a limited number of prototypical basic emotions that can be ‘read’ through universal facial expressions fail to grasp what emotions are and what facial expressions convey.
In anthropology, the ‘affective turn’ has drawn attention to the distinction between affect and emotion—the former a precognitive sensory response or potential to affect and be affected, and the latter a more culturally mediated expression of feeling. White describes this as the difference between ‘how bodies feel and how subjects make sense of how they feel’ (White, 2017, p. 177). These nuances are overlooked in the field of emotion recognition, which reduces emotion to a simplistic and digitally scalable model. Barrett argues that emotion is:
a contingent act of perception that makes sense of the information coming in from the world around you, how your body is feeling in the moment, and everything you’ve ever been taught to understand as emotion. Culture to culture, person to person even, it’s never quite the same. (Fischer, 2013)
We might, therefore, define the process of interpreting one’s own emotional state as making sense of an inner noise of biological signals and memories, in contextually contingent and socioculturally mediated ways, and placing them into—and in the process co-constructing—socioculturally mediated categories. It may also sometimes involve not definitively categorising or making sense of these affective feelings. As this article will show, it is the very ambiguity or malleability of this process that may help make vibraimage a convincing technology of emotion recognition and provide authority to its analysis.
Given these growing critiques of Ekmanian theories of universal basic emotions expressed through facial expressions, researchers at the organisation AI Nowhave concluded that, by extension, the digital emotion detection industry is ‘built on markedly shaky foundations…. There remains little to no evidence that these new affect-recognition products have any scientific validity’ (AI Now Institute, 2018, p. 50). Baesler, similarly, argues that the use of emotion detection software by the TSA was ‘unconfirmed by peer-reviewed research and untested in the field’ (Baesler, 2015, pp. 60–61), while holding significant potential for harm through misuse. In common with broader critiques of AI from critical algorithm studies (e.g., Eubanks, 2018; Lum & Isaac, 2016), machine learning methods involved in emotion recognition systems have been criticised for racial bias, based on their training data sets (Rhue, 2018). Indeed, Ekman’s work not only constructs ethnocentric emotional categories but also racial subject categories, for example in his creation, with Matsumoto, of the Japanese and Caucasian Facial Expressions of Emotion stimulus set of photos showing emotional expressions of archetypal ‘Japanese’ and ‘Caucasian’ subjects (Biehl et al., 1997;
‘Welcome to the Vibraimage World!’ 8 : Attempts to Construct Scientific Legitimacy
While most digitalised emotion recognition systems are premised on facial expression analysis to determine the combination and intensity of the handful of ‘basic’ emotions that a subject is displaying, alternative technologies for evaluating emotional or mental states from other facial or bodily physiological data have also been developed. These include analyses of gait, voice or eye movements, and combinations of physiological data, like polygraph lie detector tests. An alternative approach—‘vibraimage’—has also emerged, developed by Russian biometrist Viktor Minkin since around 2000. 9 This technology forms the basis for Mental Checker and Defender-X, the software products offered by ELSYS Japan, the Japanese affiliate of ELSYS Corp, a Russian company founded by Minkin. 10
Vibraimage involves recording a short video of a subject to measure and analyse ‘vibrations’ of the face and head: imperceptible and involuntary micro-movements caused by muscles and the circulatory system. These movements are partly related to the vestibular system, which involves parts of the inner ear responsible for maintaining balance and spatial orientation (equilibrioception). Minkin observes that the vestibular system is linked to certain psychological disorders and argues that its function is intimately connected to emotional and mental states, which he describes as the ‘vestibulo-emotional reflex’ (Minkin & Nikolaenko, 2008). Thus, according to Minkin, data about involuntary head movements can be measured and analysed to generate information about these mental and emotional states, as well as inferring other information, like personality type (Minkin & Nikolaenko, 2017a). 11 Physical balance and stability are directly equated to mental and emotional balance and stability. Minkin, like Ekman, references Darwin’s 1872 work on the evolutionary link between facial expression and emotion. While Ekman developed this work by focusing on facial expressions, Minkin examines biological and mathematical links between muscular activity and brain activity, citing nineteenth-century Russian physiologist Ivan Sechenov and others who have contributed to the field of psychophysiology (a branch of psychology exploring links with physiology), including Sigmund Freud and Ivan Pavlov, as well as Norbert Wiener’s theory of cybernetics (Minkin, 2017, pp. 6, 18). Minkin refers to his theory as ‘the thermodynamic model of emotions’ because he draws a direct link between specific emotional–mental states, muscular activity that can be measured through micro-vibrations of the head, and the energy this muscular activity expends: ‘physiological and psychophysiological processes proceeding in a human body are associated with the exchange of energy and information within or between human physiological systems’ (Minkin, 2017, p. 50). 12 According to this theory, involuntary movement of the face and head is emotion, intention and personality made visible.
Vibraimage technology also appears to have roots in cold weather experiments conducted in the Soviet Union. A manager at ELSYS Japan stated that the algorithms that made the connections between head vibration data and particular emotional–mental states had been trained using a proprietary big data set:
It’s actually from the Russian side [i.e. ELSYS Russia]… They have like data from actual—they put some people in cold weather and examine what kind of vibrations they will have under such circumstances. So some people are really frightened and they have their vibrations, and some have the actual examinations of like 100,000 people. So those data—even America cannot do such examinations because it’s a human rights stuff. But in the Soviet Union, they could do that, so… [JW: So it’s Soviet Union-era data?] Yes, that’s why other companies cannot make such a system. Except in North Korea!
13
(Interview, 17 April 2019)
Minkin does not reference this data set from the Soviet Union in his publications, although it is mentioned in at least one Japanese newspaper report about ELSYS Japan (Saito, 2016), and Minkin worked in a state biometrics laboratory during the Soviet era (JETRO, 2019), which might feasibly have given him access to such data. In a subsequent email, the manager quoted earlier stated that: ‘We got the information directly from Mr. Viktor Minkin. The experiments have been done with people aged between 2 months to 90 years old with various nationalities. However, the detailed information has not been disclosed, so it’s not referred in any of his work [sic]’ (email, 26 February 2020). If these data exist and were used to train the vibraimage algorithms, this would raise serious questions: should data apparently obtained in a highly unethical manner be utilised in algorithmic systems and, if so, should this information be disclosed to potential end users? It would also raise technical questions about how exactly this data set was used to derive algorithmic connections between precise types and intensities of emotional–mental states and head movements—a significant piece of the puzzle of how vibraimage works that is conspicuously absent from publications aimed at establishing its scientific veracity. If the data do not exist, it equally raises questions about why a narrative of human experimentation is viewed by Minkin as a valid and effective means by which to legitimise and promote the technology.
On 20 January 2020, a search of Google Scholar for the term ‘vibraimage’ yielded 287 results. Of these, a large proportion were written or co-authored by users with strong commercial interests in the success of the technology, such as Minkin himself and employees of ELSYS Russia or international affiliates that hold distribution rights to vibraimage, thus introducing significant potential for bias. 14 A total of 41 were published in the proceedings of two conferences on vibraimage technology organised and hosted by ELSYS Russia. The first, entitled ‘Modern Psychophysiology: The Vibraimage Technology’, was held in St Petersburg in 2018, with apparent support from the European Academy of Natural Sciences and Russian Biometric Association. Further conferences were held in 2019 and 2020. Very few articles on vibraimage appear to have been published in academic journals with rigorous peer review processes; several appear in journals like the Journal of Behavioral and Brain Science and Intelligent Control and Automation, both published by Scientific Research Publishing, a Chinese company included in Jeffrey Beall’s list of predatory or questionable academic publishers, with potentially poor journal standards. 15
Many papers that feature vibraimage technology proceed from the a priori assumption that its reliability and effectiveness (and the existence of a ‘vestibulo-emotional reflex’ on which it is premised) have already been proven, and involve conducting vibraimage tests on subjects and interpreting the results—rather than questioning or attempting to verify the efficacy of the underlying technology itself. In one article, Minkin and Yana Nikolaenko, Chief Psychologist at ELSYS Russia, state that they aim to ‘introduce…a new term, vestibular emotional reflex or vestibular energy reflex (VER)’ (Minkin and Nikolaenko, 2008, p. 196) but proceed to lay out technical details of vibraimage rather than providing any evidence that such a reflex exists or about the nature of its connection to head movements. Elsewhere, Minkin and Nikolaenko use vibraimage to group ‘criminal’ and ‘non-criminal’ research subjects by personality type (Minkin & Nikolaenko, 2017b). The data itself appear to show a random distribution, but the authors interpret it to fit into a schema of personality types, although there seems no way to verify the accuracy of the way it assigns subjects to personality categories, particularly since their analysis is based on uncovering ‘unconscious’ responses inaccessible to other psychometric approaches. Nevertheless, they state that this methodology can be used to predict which personality types are prone to commit crimes:
If we assume that the revealed picture of differences between conscious and unconscious responses accurately reflects the hidden information, then this method can be the basis for identifying individuals who intend to commit criminal acts or who are predisposed to commit such acts. (Minkin & Nikolaenko, 2017b, p. 459)
Similarly, a paper by Nikolaenko (2018) aims to measure the ‘level of delinquency’ among adolescents by identifying those with personalities likely to commit crimes. While much of the predictive policing industry in the USA, dominated by companies PredPol, IBM, Palantir and HunchLab, applies machine learning to big data sets ‘to identify likely targets for police intervention and prevent crime or solve past crimes by making statistical predictions’ (Perry et al., 2013, p. xiii; cf. Lum & Isaac, 2016), 16 vibraimage promises to identify suspect criminal types through a twenty-first-century version of phrenology. Although in their article mentioned earlier, Minkin and Nikolaenko provide the caveat that, ‘[t]he question of estimation accuracy of psychophysiological parameters of an individual, certainly, demands a larger set of statistics, and it was not investigated in this paper’ (Minkin & Nikolaenko, 2017b, p. 459), it is unclear how any degree of accuracy in assigning personality types as presented in such papers could be achieved, since the results appear neither falsifiable nor reproducible. As an open letter by the Coalition for Critical Technology argued in 2020, attempts to identify criminality through the analysis of physical appearance or characteristics inevitably lead to inaccurate and biased results not least because the category of criminality they measure against is socially constructed and shaped by inherent racial, class and other biases. 17
Very few peer-reviewed academic articles outside ELSYS Russia’s vibraimage research ecosystem seem to have been published. However, one such article by Japanese researchers with no apparent commercial connection to ELSYS attempted to establish the efficacy of vibraimage in measuring actual or latent mental states in order to identify suspect individuals, by comparing various established paper-based psychological methodologies with vibraimage. The authors found almost no statistically significant correlations between the results of existing psychological tests and those produced by vibraimage, leading them to the carefully worded conclusion that:
present psychological measurement research cannot identify what is being measured by the indicators that express mental state based on vibraimage technology… Since we do not know what the parameters are measuring, we cannot authoritatively state that it is not effective as a system to detect suspicious people. What we can say as a result of this research is that we do not understand what it is measuring. We cannot say that the possibility of detecting a suspect is zero. (Ōkubo et al., 2018, p. 25; author’s translation)
It is important to note the importance of this opacity—not knowing what exactly vibraimage is measuring and what these measurements mean—which I will discuss further later in this article. The only other paper directly addressing the question of vibraimage’s level of accuracy was written by Minkin himself. Observing that alternative ‘standardized measures (standards) for measuring the psychophysiological state (PPS) do not currently exist’ (Minkin, 2019, p. 212), Minkin compares the accuracy of measurements provided by different versions of the same vibraimage software. Unsurprisingly, he concludes that vibraimage has a very low measurement error rate.
Minkin credits the inspiration for his ‘thermodynamic model of emotion’ to Libb Thims and Georgi Gladyshev (Minkin, 2017, p. 17) and proposes this model in a paper co-authored with them. This article, cited by many contributors to the 2018 and 2019 vibraimage conferences as proof of the technology’s efficacy, argues that: ‘head balance for person without consciously movements could be considered as isolated thermodynamic system and any internal energy as emotion would change the balance of this internal system and realized by movements or vibrations [sic]’ (Minkin et al., undated).
Gladyshev is a Russian scientist who has theorised the role of ‘hierarchical thermodynamics’ in living systems—an unorthodox and disputed area of physics. Thims is a self-described ‘American electrochemical engineer, thermodynamicist, and physicochemical free thinker’ 18 who appears to operate a series of websites, focusing on his interests in ‘human thermodynamics’, atheism and historical figures with high intelligence quotient (IQ) scores. Thims argues in convoluted fashion that humans are molecules, and that social interactions are chemical reactions; in an extreme case of reductivism, this claim is presented not as a metaphor but as a literal scientific fact. Thims ran a self-published journal entitled ‘Journal of Human Thermodynamics’ 19 between 2005 and 2016, to which Minkin and Gladyshev contributed articles, and Gladyshev acted as a peer reviewer, 20 although the Wikipedia-like peer review process described on the website does not meet the standards of most academic journals, 21 and Thims’ own articles include titles like ‘Thermodynamic Proof that Good Always Triumphs over Evil’ (Thims, 2011). One of Thims’ websites includes several pages documenting and refuting the critiques of various detractors, most of whom have described his work as pseudoscience 22 ; this catalogue of critics includes Gladyshev himself, who writes to Thims: ‘I believe that you have created an incredible mess. I’m beginning to understand that you do not know “what science is?”’ 23
My purpose is not to impugn Thims’ airing of his philosophy of human thermodynamics via personal websites, but it is rather to elucidate the intellectual origins and theories underpinning vibraimage. Minkin presents a similar type of reductivist philosophy to Thims and seems inspired by Thims’ views in seeing human behaviour as interpretable through the measurement of psychophysiological data, while specifically disregarding all other contextual information. For example, in one of Minkin and Nikolaenko’s (2017b) experiments to measure intelligence type, subjects divided into three groups according to levels of ‘deviant behaviour’ (labelled as ‘drinkers’, ‘criminals’ and ‘lawyers’) take one kind of written test, while vibraimage measures their ‘unconscious’ response to the same questions, with the differences between the ‘conscious’ and ‘unconscious’ answers proving the value of vibraimage in uncovering the underlying truth about these subjects. In this way, the authoritativeness of vibraimage is built on actively undermining the agency of the subject in consciously determining their own mental–emotional state.
By measuring vibrations of the head, Minkin argues, all kinds of information about a person can be inferred: not only the characteristics of mental–emotional states listed earlier but also character traits, including truthfulness, personality and intelligence types, ‘deviant behaviour’ and delinquency, and propensity to commit crimes in the future. Perhaps most troubling is Minkin and Nikolaenko’s proposed application of vibraimage in 1984-style techno-fascist tests of loyalty to a company or state:
Programs of the loyalty analysis to the principles and values of any state, society or production company can be based on the suggested method. Passing such a test of loyalty … may be in the future an integral part of obtaining a visa, along with scanning fingerprints. Not so long ago it seemed that biometric identification of a person is a pure fiction, but the next step in improving security will inevitably be the biometric identification of a psychophysiological state. Loyalty to the principles and values of the state is the same characteristic of a person as fingerprints, and its biometric identification is a very specific technical problem that has a unique solution, for example, using the vibraimage technology. (Minkin & Nikolaenko, 2017a, p. 129)
Since Minkin and others propose that vibraimage can be used not only to determine current mental–emotional states and personality types but also to predict potential future behaviour, these claims are, of course, difficult to verify.
Conclusion: Opacity and Suspect Artificial Intelligence
Vibraimage is a system that appears to measure and quantify micro-movements of a subject’s head from a video source and convert these numbers into various highly precise values that describe and categorise the mental–emotional states of subjects. Its algorithms provide parameters to sort subjects, for example, into non-suspects and suspects: the latter people who have not necessarily (yet) committed any crime, offence or otherwise negatively perceived action, but have failed a test of head movements and may therefore be subjected to detention, questioning, having their job, promotion or visa application rejected, or other pre-emptive disciplinary actions. Knowing that one is under the eye of a surveillance system that claims to quantify one’s personality and predicts one’s behaviour may make vibraimage’s analysis a self-fulfilling prophecy, whether through the test procedure (making the subject feel stressed, anxious or aggressive) or the categorisation of the subject as suspect, and thus to be treated with suspicion or as a potential criminal. As Graham and Wood note, because automated systems aim for exclusionary goals, ‘[a]lgorithmic systems thus have a strong potential to fix identities as deviant and criminal—what Norris calls the technological mediation of suspicion’ (Graham & Wood, 2003, p. 234, citing Norris, 2002).
Yet as I have shown, the exact meaning and significance of what vibraimage measures is unclear: there is no reliable data pertaining to the accuracy of any of its parameters, and the process of quantification and categorisation of measurements into different values is opaque—a black-box process typical of many commercial algorithmic systems. Even if the way the algorithms worked were ‘explicable’, and even if the epistemological claims of Minkin and his collaborators concerning a ‘vestibulo-emotional reflex’ were credible, there is no coherent explanation of why certain intensities of head movements equate to a particular precise combination of emotions, behaviour, intent or character. Vibraimage is a system about whose workings and efficacy complete transparency and knowledge seem to be not only unforthcoming but practically impossible to attain.
Nevertheless, the vibraimage developer, distributor and user community is no longer a peripheral AI subculture—it is gaining momentum as an accepted form of security technology aligned with facial recognition systems, and it is already used by major multinational corporations, nuclear facilities and police forces across Russia and parts of Asia. Facial recognition is an everyday urban technology in China, and increasingly also in South Korea and Japan, where there are plans to greatly expand its use across society, at cashierless shops and ATMs, public and private events, transport terminals, job interviews and in law enforcement (Gershgorn, 2020). Emotion detection is increasingly included by default in facial recognition systems, as with Amazon’s Rekognition and Google Cloud Vision API. Vibraimage’s algorithms may be—perhaps already have been—incorporated into facial recognition systems, like those of ELSYS Japan’s client—NEC—one of the world’s largest suppliers of facial recognition technology, to be combined with other algorithms and subsumed into yet more complex, opaque, algorithmic surveillance infrastructure, as suggested by the ELSYS Japan manager quoted earlier (Nonaka, 2018, p. 147).
As people come under increasing digital surveillance in public and commercial spaces, the apparent possibilities of harvesting additional biometric data relating to mental–emotional states are clearly becoming an ever more tempting and important market for governments and corporations—particularly to identify potential non-compliance (‘suspiciousness’) among citizens, consumers and migrants entangled in the kinds of increasingly complex and expensive surveillance assemblages listed earlier that are becoming inescapable parts of daily life, movement and work. These are areas in which a lack of state regulation of digital emotion recognition technology—especially relating to its application in geographical and socio-economic peripheries like border enforcement, or more covert industries such as security services, law enforcement or nuclear energy—may facilitate the introduction and propagation of systems like vibraimage. Vibraimage bestows even more power than a mainstream digital surveillance system because the results are precise yet somewhat ambiguous, open to interpretation and cannot be validated. The first time I was analysed by Mental Checker (prior to the analysis at ELSYS Japan’s office), I could not help thinking that it might be accurate—partly due not only to the testing context (at a major Tokyo technology expo, by an authoritative-sounding NEC manager, using seemingly cutting-edge software) but also because there is no way to subjectively quantify the degree, scale or range of one’s own emotions. I found myself momentarily calibrating my understanding of my personality to the numerical values explained by the manager.
Vibraimage wrests the power to interpret one’s own feelings and intentions, from the individual to the technology’s operator, and holds out the threat of a capture of human interiority and even a remaking of humanness by technocracy, enabling worrying new degrees of control. If emotion is a process of making sense of how one feels—of an inner noise of biological signals and memories—in socioculturally mediated ways, and placing these feelings into categories, vibraimage similarly involves the interpretation and categorisation of a kind of ‘noise’: raw data on head movement. ELSYS Japan’s brochure states that: ‘Objective and stable data about the person can be obtained by Mental-Checker because it can visualise (quantify) the unconscious parts of the person neither by the opinions of the surrounding people nor by self-assessment [sic]’ (ELSYS Japan brochure, undated). As suggested by the director’s quote at the start of this article (‘Probably you yourself didn’t know this, but you’re a very aggressive person, potentially’), proponents of vibraimage claim it possesses the algorithmic authority to empower the operator to know subjects better than subjects know themselves, by directly accessing and revealing their unconscious. Making vibraimage’s interpretation and knowledge of a subject’s mental–emotional state more convincing and authoritative than that of the individual is accomplished through processes of quasi-scientific legitimation (articles, conferences), design (sleek data visualisations) and performance (testing procedure, appeals to the precision of AI, authoritative operator), combined with the corporate power of public and private security infrastructure and a lack of knowledge, awareness or will to intervene in a booming market among politicians and regulators.
In their self-published book, Vibraimage, Minkin and Nikolaenko refer to Pavlov’s famous experiments on dogs, in which he conditioned a physiological response (salivation) to an external stimulus (the sound of a metronome or bell). They invert Pavlov’s findings, suggesting that physiological responses can be quantified, analysed and traced back to reveal the motive mental–emotional state that precipitated them: ‘Now, practically, every person can act as a researched dog, and as Academician Pavlov since all that is needed for psychophysiological experiments is a computer and a web camera’ (Minkin and Nikolaenko, 2017a, p. 129). They seem to suggest that vibraimage will democratise psychological insight: helping us understand what makes us tick—or salivate. But we might understand this analogy and power dynamic in a different way, whereby the operator of vibraimage—corporation or government—is Pavlov and the subject—employee, citizen or migrant—is the dog. Through this technological process, the subject is, at a profound level, made ‘legible’ (Scott, 1998) to governments and corporations. By this, I do not mean that the subject is rendered transparently readable, but, rather, through the power provided by the ambiguity and opacity of the system’s algorithmic knowledge production, that the operator can determine—in both senses simultaneously—the emotions, character, current intentions and future behaviour of the subject.
In the case of vibraimage, the very opacity of the technology is its main characteristic and value. Ambiguity and uncertainty are leveraged, particularly in the performative interpretation of precise measures of imprecise subjective states: the black-boxedness of the technology corresponds to that of the human emotions it claims to measure. Opacity and ambiguity constitute a currency that provides part of the system’s authority and enables control to be exerted by the operator, introducing a highly unequal power relations and the potential for unjust decision-making and outcomes for those subjected to this system.
Ananny and Crawford have critiqued the idea that transparency alone is a sufficient condition for holding algorithmic assemblages accountable—‘that knowing is possible by seeing’ (Ananny & Crawford, 2018, p. 977). They argue that transparency as an ideal has numerous limitations and may even be harmful under certain conditions, instead calling for a model of algorithmic accountability that accepts these limitations and focuses on the sociotechnical assemblage that includes the algorithm as one element among many:
if a system must be seen to be understood and held accountable, the kind of ‘seeing’ that an actor-network theory of truth requires does not entail looking inside anything—but across a system. Not only is transparency a limited way of knowing systems, but it cannot be used to explain—much less govern—a distributed set of human and non-human actors whose significance lies not internally but relationally. (Ananny & Crawford, 2018, pp. 983–984)
Does it matter that we cannot see exactly how vibraimage’s algorithms work, in order to hold accountable the broader surveillance assemblages of which they are a part? Even if we had full visibility of them, how would we know whether they actually ‘work’ in interpreting subjective mental and emotional states? Clearly, as Ananny and Crawford argue, transparency is not always the remedy for opacity, although scrutinising what we can of algorithms and quasi-scientific claims made about their effects is an essential part of critically analysing them, as is examining their position within broader power structures—seeing across the system of which they are one element. This is particularly the case in countries like Russia and China—or even Japan and South Korea—with relatively little public accountability or critical media (or even academic) scrutiny of government technology policy or corporate technology strategy, although, in Japan, such scrutiny has grown, following the 2011 Fukushima disaster. Recognising the role of powerful states and corporations with little regulation in facilitating algorithmic assemblages like vibraimage surveillance systems is essential in understanding not only the limitations of transparency as an ideal but also the limitations of the current push, primarily in Euro-American academic discourse, for an ideal ethics of AI that may have little practical relevance in other sociocultural contexts.
The development and processes of scientific legitimation of vibraimage presented here constitute a case study of what I term ‘suspect AI’. Vibraimage and several other AI emotion detection systems purport to identify suspects—persons suspected of an offence or some kind of ill intent, without proof or clear evidence. The adjective ‘suspect’ can be defined as ‘not to be relied on or trusted; possibly dangerous or false’, 24 and I argue that this definition perfectly describes vibraimage itself—an AI system that is opaque, secretive, unproven and dangerous—having the potential for significant harm. In situations where full transparency and knowledge about the workings and effects of an algorithmic assemblage are practically and/or politically impossible to attain, popularising the term suspect AI, and adopting an analytical position of critical suspicion, may provide a form of activism and resistance, by interrogating such systems’ constructed, yet unproven, legitimacy as they attempt to exert ever-greater authority over subjects.
Footnotes
Acknowledgements
The author gratefully acknowledges the assistance of employees at ELSYS Japan, who generously answered questions about vibraimage. I would like to thank Hallam Stevens, Russell Henshaw, Arthur Thompson, and the anonymous reviewers for their invaluable feedback.
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author disclosed receipt of the following financial support for the research, authorship and/or publication of this article: This work was supported by Michelin and the Fondation France-Japon de l’EHESS.
