Abstract
Computational photography is currently altering the representational and social functions of photographic imaging. A range of heavily automated computational processing techniques produce images that remediate digital photography to circumvent physical limitations associated with the size of smartphones, emulating the aesthetics associated with larger format digital cameras and professional photographic workflows and practices. These processes include automated compositing where images seen by users are constituted of up to 15 individual frames, the simulation of a shallow depth-of-field, automated facial retouching and even providing automated assistance to suggest alternative frames within the image stream to serve as the base image. This article explores these emerging techniques and accompanying claims that such processes are radically transforming photographic practice. While the extent and modes of automation and algorithmic processing depart from prior practices, contextualising them within the histories of photographic compositing and the algorithmic malleability of digital photography suggests the intensification of existing trends rather than an epistemic break. Furthermore, exploring the representational politics of automated facial retouching and the datafication of images situates these changes within the broader social context of dataveillance and platform capitalism.
Keywords
Introduction
The production of photographic images is undergoing significant change. Primarily driven by digital cameras embedded within smartphones – powerful portable computers with wireless Internet connectivity – forms of computational photography are emerging which employ numerous heavily automated processes, often utilising machine learning, to produce images that would otherwise have required a physically larger camera sensor, or that simulate techniques associated with professional photographic workflows. This is exemplified by automated multi-image compositing that enhances the dynamic range of images and the use of pattern recognition algorithms and depth maps to simulate a shallow depth-of-field. Computational photography refers to the in-camera-application processing of digital images, typically before the user even sees ‘an image’, rather than the manipulation of already-taken images using a separate computer or an after-the-fact editing application installed on the smartphone. While computational photography has been enthusiastically addressed within academic computer science contexts (Lukac, 2017; Wan et al., 2012), it has thus far been relatively underexplored within media and cultural studies.
Computational photography builds upon the capacities of the smartphone-as-camera, whereby the digital camera-device is a component within a socio-technical assemblage (Shanks and Svabo, 2014) of server farms, fibre-optic cables, cellular and Wi-Fi access points, Global positioning system (GPS) satellites and other infrastructure. In little over a decade, this assemblage has transformed photographic practices. Smartphone photography affords the production of vast numbers of images, of approximately 1.8 trillion images taken in 2017, 85% were taken with smartphones (Meeker, 2018). Furthermore, smartphones are employed to capture, share and view images (Palmer, 2014). Smartphone photography has also altered the social function of photography; ‘Photography has gone from being a medium for the collection of important memories to an interface for visual communication’ (Gómez Cruz, 2016: 229). While this frequently takes the form of images as conversations between humans on social media platforms, within this photographic ecology, corporate dataveillance based on locational data is increasingly inescapable (Hjorth and Pink, 2014).
This article begins by outlining the central techniques, technologies and practices associated with computational photography, using Google’s Pixel 3 smartphone as a device that exemplifies contemporaneous computational photography. Whereas numerous smartphones employ multiple distinct optical pathways and sensors to circumvent physical limitations associated with smartphone photography, the Pixel 3 is relatively unique in employing a single optical system for the primary imaging system (the one located on the back of the smartphone), alongside advanced computational techniques to produce images. 1 Despite this optical handicap, consumer technology and photography review sites typically contend that the Pixel 3 produces images of equal or superior quality to competitors such as the Apple iPhone XS, Samsung Galaxy S10 and Huawei P20, all of which use multiple cameras (with lenses of differing focal lengths) alongside computational photography techniques, precisely because of Google’s advanced implementation of computational photography.
This elaboration of technical processes is followed by a discussion of whether computational photography should be understood as a revolutionary rupture from the history of photographic practices, as is claimed in marketing materials, or is better grasped as continuing developments that have occurred throughout the history of photographic production. Such analysis echoes debates which took place at the genesis of digital photography, where perspectives were advanced arguing both that digital photography marked a fundamental rupture to the ontology of image production (Levinson, 1999; Nichols, 1988) and that these changes should be contextualised by understanding photography as a set of embodied socio-technical practices that were never as stable as an analogue/digital binary suggests (Lister, 1995; Tagg, 1988).
Computational photography problematises the oft-imagined objectivity of images, whereby ‘The photograph is literally an emanation of the referent. From a real body, which was there, proceed radiations which ultimately touch me, who am here’ (Barthes, 1981: 80). Barthes’ relationship between referent and image involves images produced from a single frame, a unique, mechanically preserved moment. While the algorithmic malleability of digital photography challenged this perspective, computational photography’s reliance upon compositing multiple frames further undermines attempts to connect photography to indexicality. However, I aim to temper claims surrounding the radical newness of computational photography by contextualising these practices within the history of photographic compositing and illustrating how digital photography has always been algorithmically manipulated.
The final section connects computational photography and a datafied gaze to particular social and representational problematics surrounding paradigms of platform capitalism, metric culture and neoliberalism (Andrejevic and Burdon, 2015; Beer, 2016). Here, I argue that computational photography must be understood as part of broader technocultural developments surrounding computational capitalism, quantified and branded selves and the ‘datafication’ of everyday life (Lupton, 2019; Van Dijck, 2014). Situating computational photography within this context therefore contributes to the growing contemporary literature surrounding platform seeing (MacKenzie and Munster, 2019), operational images (Paglen, 2014) and automated media (Andrejevic, 2020). However, before addressing the cultural politics of computational photography, we must return to the initial question of precisely what this new assemblage of photographic technologies and techniques involves.
What is computational photography?
Defining emergent forms of computational photography is immediately complicated by the fact that digital photography has always been subject to computational processing. Indeed, many of the techniques automated by contemporary computational photographic practices are remediations of pre-existing digital photographic processes, whereby relatively complex, professional workflows are automatically undertaken by software on smartphones. While this calls into question whether we understand computational photography as a fundamental rupture with previous modes of digital photography, or whether it simply continues trends towards the automation of image production, outlining key practices associated with computational photography is necessary to illustrate the changes taking place. Techniques associated with computational photography have been implemented gradually, with versions of automatically enabled high dynamic range (HDR) composite photography present on iPhones since the launch of iOS 4.1 in 2010 (The Verge, 2013). Other features, notably those which leverage machine learning, have primarily been implemented since 2016. Consequently, whereas digital photography has a relatively straightforward genealogy, as the production of digital sensors and cameras can be easily demarcated from their analogue counterparts, computational photography has a rather less definite genesis.
Arguably, the greatest change associated with contemporary computational photography is the movement away from images composed from a single temporal frame. Conventionally, film or digital images are produced by the photographer actuating a mechanical or electronic shutter, the active decision to capture what Henri Cartier-Bresson (1952) described as ‘the decisive moment’. Computational photography is predicated upon compositing multiple source images. Emblematic of such frame blending practices is the HDR+ photography employed in the Google Pixel 3. When the Pixel 3’s camera application is running, the phone constantly sends frames to RAM. Unlike traditional cameras, where the photographer pressing the shutter-release triggers the recording process, hitting the record button signals the mid-point within a 15-frame image stream the camera sends for computational processing. The significance is not just that the camera composites these images, the practice of taking a photograph is transformed from a human operator capturing a moment of choice, to delineating a sequence of images to begin working backwards and forwards in time from. Using data from those 15 frames, the Pixel 3 generates a composite image containing HDR, greater resolution and lower levels of noise than would be possible with a single exposure.
One of the defining features of smartphone cameras is their compact size. This entails there are optical limitations to smartphone photography. The sensor in the Pixel 3 measures just 5.76 × 4.29 mm, as illustrated in Figure 1, this is significantly smaller than the 1-inch (13.2 × 8 mm) sensors found in many compact digital cameras, let alone the APS-C (23.6 × 15.6 mm) or full frame (36 × 24 mm) sensors found in DSLRs. All else being equal, larger sensors are preferable; a larger surface area entails that more light falls upon the sensor during any given exposure; the greater surface area of each pixel allows more photons to be gathered within each well before it becomes saturated. Consequently, larger sensors typically produce a HDR and less noise (especially in low light).

Camera sensor sizes.
By compositing numerous images, computational photography allows the simulation of a single frame with HDR than its constituent parts. Overexposing some frames while underexposing others allows the highlights to be taken from underexposed images and the shadows to be taken from overexposed images, effectively allowing areas of the frame which would otherwise be lost in deep shadow or overexposed to the point of retaining no detail to be correctly exposed. The issue with compositing frames is that when either the subject or the camera moves between frames, artefacts that do not correlate to human vision of the world are produced. Computational photography addresses this by employing algorithms that automatically align images to eliminate camera movement and that remove ghosting 2 artefacts by choosing not to composite areas of the frame where there is subject movement. These automated decision-making processes requires significant computational processing power, explaining why this previously required professional postproduction software (such as Adobe Photoshop) running on a discrete computer. Effectively, the ongoing miniaturisation of microelectronics which has seen mobile phones move from telecommunications devices to powerful portable computers affords the processing power required for computational photography to function.
A related feature which leverages the affordances of the image stream is referred to as ‘Top Shot’ on the Pixel phones. While the camera application by default shows a composited image based upon the frame captured when the phone’s screen is touched, this is not the only option, and a machine learning algorithm suggests other frames from the image stream which may have captured preferable moments according to a normative aesthetics. This includes detecting when a subject’s eyes are open rather than blinking, when a subject is facing towards the camera, or when faces are not obscured by foreground objects. The suggestions made by Top Shot denote how computational photography supplants the decisive moment, instead employing a process analogous to extracting a still image from a video clip. Furthermore, the use of machine learning algorithms to guide users towards the ‘best’ images within the stream demarcates partial automation of this decision-making process.
While compositing addresses one major shortcoming associated with smartphone photography, recent advances in computational photography additionally employ machine learning algorithms and depth maps to address a second consequence of using small sensors; images have a very deep depth-of-field. The depth-of-field refers to the range of distances from the image sensor that are captured in sharp focus. Isolating a subject in sharp focus while blurring the foreground and background enables photographers to lead the viewer’s eye towards key compositional points. Larger sensors afford shallower depth-of-fields. 3 Consequently, in order to mimic the aesthetic associated with professional imaging devices, smartphone manufacturers developed portrait modes that map depth and selective blur areas of the image. Figure 2 illustrates a before and after view of portrait mode, although the ‘before’ image is saved, it is not seen by default.

Portrait mode off/on.
On smartphones that employ multiple cameras, portrait modes employ the different optics (or in some cases a dedicated depth-mapping sensor) to produce a depth map in a manner analogous to how binocular human sight functions. Conversely, the single camera in the Pixel 3 computes depth employing phase detection autofocus (sometimes called dual pixel autofocus), whereby each pixel contains two photodiodes, enabling stereoscopic information to be read from the differences between two viewpoints located less than 1-mm apart (Garg, 2018). This information is then fed through a convolutional neural network that masks and selectively blurs the image.
The use of multiple cameras, dedicated sensors or dual pixel autofocus to produce depth maps denotes that computational photography does not just employ software to impact images. Computational photography requires an assemblage of software and hardware, not only for producing depth maps, but to undertake the computational processing required for HDR images and other near-real time image processing. Microprocessors such as the Apple A12 processor feature an on-board graphics processing unit (GPU), as graphics processors perform machine learning and image processing tasks faster than central processing units (CPUs; MacKenzie and Munster, 2019). The Google Pixel goes a step further, in addition to the CPU’s onboard GPU, the phone contains a separate image signal processing chip, the Pixel Visual Core, which is specifically designed for augmenting computational photography. While computational photography, and machine learning more broadly are commonly argued to be innovations focussed on software (Velasco, 2019), exploring their materiality reveals an assemblage composed of hardware and software working in tandem.
Since their introduction in 2017, computational methods for simulating shallow depth-of-field aesthetics have improved significantly. Initial variants of portrait mode typically struggled with nonhuman subjects, depth-mapping frequently produced unrealistic results where objects within the same focal plane appeared both in and out of focus, and complex shapes such as hair were crudely masked. Over a short period of time, however, results have improved significantly. While dedicated photographers will still routinely identify differences between large sensor images and simulations produced on smartphones, this is unlikely to be true for a large proportion of end-users, especially when images are predominantly viewed on small-screen devices.
The Pixel 3 also employs machine learning for a computational re-lighting mode that Google refers to as ‘synthetic fill flash’. Professional photographers often employ lights or light modifiers to increase the illumination of a human subject’s face, as this characteristically comprises the image’s primary point of interest and is designed to stand out. Smartphones do not typically allow remote control of external flashes and are not associated with carefully orchestrated scenes requiring large lighting modifiers held by assistants. The synthetic fill flash mode on the Pixel 3 simulates the process of reflecting light onto the faces of human subjects by using a machine learning algorithm that detects human faces, and then brightens exposure in those areas.
Examining the main techniques and features associated with computational photography explicates that these heavily automated techniques, many of which employ machine learning, are employed for several purposes. HDR compositing and portrait mode both simulate the aesthetics of larger cameras, using computationally complex processes to circumnavigate the physical limitations of compact mobile devices. The synthetic fill flash emulates processes and tools for modifying light which professional photographers employ on-location. Finally, Top Shot and HDR compositing leverage the affordances of an image stream, which appears to be the most significant departure from previous forms of digital photography.
One or many frames?
Computational photography suggests a departure from conventional understandings of both analogue and digital photography if photography’s defining qualities are mechanically freezing moments of time or the photographer capturing ‘the decisive moment’. The utilisation of composite images derived from multiple frames, some of which precede the moment of ‘capture’, indicates a different temporal relationship between the ontology of these images and those produced by a photographer actuating a shutter. However, the history of photographic imagery has long been entangled with processes and practices of compositing, the production of a single photograph from multiple constituent images.
Digital compositing has long been the norm for particular photographic genres. For example, deep sky astrophotography typically requires specialised cameras with cooled charge-coupled device (CCD) sensors and tripods equipped with tracking mounts to take numerous frames that are composited in specialised software such as DeepSkyStacker, which increase the signal-to-noise ratio, subtract thermal signal (noise generated by heat from sensor) and reveal more detail within the final composite image. Images of distant nebulae and other astrological phenomena are rarely derived from a single image, typically these composites piece together 30 or more individual frames. While deep sky astrophotography forms a niche examples, a significant amount of contemporary digital landscape photography similarly employs digital compositing to create HDR images. While some DSLRs feature in-built HDR modes, more commonly, photographers take 3–7 images with exposure bracketed at regular intervals, allowing the resulting composite image to depict scenes which exceed the dynamic range of the camera sensor, such as correctly exposing both the setting sun and foreground foliage in deep shadow. Professional post-processing software such as Adobe’s Photoshop and Lightroom applications allow for relatively straightforward compositing of these images, with automatic alignment and de-ghosting.
Photographic compositing long precedes digital imagery though and can be traced back to the earliest days of photography. One famous early example is Oscar Gustave Rejlander’s The Two Ways of Life which was produced in 1857. Early photographic images required extremely long exposure times, so it was considered technically impossible to photograph large groups of people in awkward poses, as any movement within the 32nd exposure would ruin the resulting image. Consequently, Rejlander captured 32 separate images, which were then skilfully printed from negatives directly onto photographic printing paper (Gernsheim, 1991: 77). The complexity of this process entailed that printing reportedly took Rejlander and his wife over 6 weeks. Far from a frozen slice of time, composite images demarcate that photography has long been an assemblage of technologies and techniques that signify, represent and mediate the world through the reduction of four-dimensional reality to two-dimensional images (Flusser, 2000: 9).
Reinterpreted through the lens of photographic compositing, the images produced by computational photography appear less like a fundamental schism from the history of photography, than the intensification of trends towards automation and the reduction of human labour time in image production. Whereas Rejlander’s composite images took weeks to produce and using Photoshop or PixInsight requires a postproduction process measured in hours or minutes, the automatic HDR+ processing on a Pixel 3 occurs within seconds. Furthermore, alongside the reduction in processing time there is a corresponding reduction in the level of expertise required of the photographer. Rejlander’s compositing required extremely precise printing because the process was destructive; there was no ‘undo’ command in analogue compositing, any mistake necessitated restarting the printing process. Photoshop, by contrast, allows a user-defined number of undo states, meaning that photographers can experiment, alter and manipulate compositions until they are satisfied with the results. Although such activity has been accused of de-skilling photographic practice (Sporton, 2015), it nevertheless requires users to be knowledgeable about affordances and techniques pertaining to Photoshop and digital photography. Contrastingly, in computational photography, compositing occurs without user intervention, and without the photographer requiring any knowledge of the processes occurring within the black-boxed apparatus of the computational camera.
On the one hand, these changes can be understood as democratising the production of technically sophisticated images. Within the contemporary conjuncture of the attention economy, where human attention is understood as a scarce and valuable commodity, (Crary, 2013; Crogan and Kinsley, 2012) the temporal demands of learning complex professional image manipulation software such as Photoshop competes with a myriad of other pressures. Automating tasks allows users to focus their valuable attention elsewhere, enabling a broad range of people to produce images which simulate aesthetics associated with professionally produced digital images. On the other hand, the automation of these practices involves what the French philosopher of technology Bernard Stiegler (2010) refers to as proletarianisation, whereby knowledge is displaced from humans into machines designed to commodify, privatise and monetise data, information and experience. Likewise, Vilhelm Flusser (2000) argues that photographic technology is an apparatus that fundamentally obscures knowledge within black boxes that simulate thought but ‘mechanise thinking in such a way that, in future, human beings will become less competent . . . and have to rely more and more on apparatuses’ (p. 32). The point and press operation of contemporary smartphones adheres to the logic Flusser outlines the following: metering, exposure variables, focus, compositing, simulated depth of field and numerous other variables are all automated. All the user needs consider is framing; indeed, Top Shot even automates the ‘final decision’ of precisely when to actuate the shutter. My point here is not that there are positive and negative dimensions to computational photography, but that the pleasure and agency associated with producing images whose aesthetic qualities were until recently demarcated as professional or high production value is precisely what enables the surveillant and extractive dimensions of computational photography and platform capitalism to function.
Unlike Rejlander’s compositing or employing software such as Photoshop to combine numerous spatially distinct images, the composite images automatically generated by computational photography feature the same subject, but arguably evidence different temporal ontologies to images produced from single exposures. This is debateable, however, as photography has always involved an assemblage of perception that brings together human and nonhuman forms of vision and mediation. Rejlander produced composite images because of the long exposure times necessitated by the material chemistry of early photography. Indeed, the earliest photographic images such as Joseph Niepce’s (1826) View from the window at Le Gras, required an 8-hour exposure. As Rebekah Modrak and Bill Anthes (2011) elaborates, this duration, ‘produced a visual paradox: sunlight and shadow can be seen on two sides of the structures . . . The camera has recorded a view that, for all its apparent veracity, is a scene which the human eye could never see’ (p. 112). This nonhuman mode of vision is equally present in the first photographic image known to contain humans, Louis Daguerre’s 1838 image of the Boulevards du Temple in Paris, whose exposure time was approximately 7 minutes. While human vision would have perceived a busy city street, the image exhibits a near deserted urban cityscape with just two figures present, a man having his shoes shined on the pavement by the heavily blurred figure of a boy. Unlike the vast majority of people whose movements were too fast to register, the relative stasis of these individuals enabled their images to be recorded. As Joanna Zylinska (2017: 21) notes, all photography bears a nonhuman trace, and the earliest images in the history of photography are particularly insightful examples of how nonhuman vision and agency have always been central to photography.
To a certain extent, the historical development of techno-social photographic convention sought to produce images that ‘masqueraded as a transparent and incorporeal intermediary between observer and world’ (Crary, 1992: 136), systematically concealing nonhuman agency and leading to conflation between human and technical modes of vision. This conflation is the outcome of social conventions, whereby technical parameters such as shutter speed are typically delimited in order to produce naturalistic images. Computational photography continues this trend; the composited images adhere to social conventions surrounding naturalism, for example, they remove ghosting artefacts to produce images that mimic human vision.
Digital photography and algorithmic manipulation
Situating computational photography within the context of photographic compositing tempers claims surrounding uniqueness predicated upon the presence of multiple frames. A similar clarification arises from understanding that digital photography has always been underpinned by computational and algorithmic processes, some of which are poorly understood by most photographers. Indeed, this was a key difference between photochemical and digital photography; ‘When the photograph became digital information, it not only became malleable and nonindexical, it became computational and programmable . . . all digital photographs regardless of the final “look” are algorithmically processed’ (Rubinstein and Sluis, 2013: 29). Two illustrative examples of algorithmic processes necessary to produce naturalistic digital images are demosaicing and the application of picture styles.
In digital photography, luminance levels derive from the number of photons landing on each individual pixel, however, colour information is not typically recorded at the pixel level. A colour filter array is overlaid upon the sensor, typically a Bayer filter; a 2 × 2 grid composed of 2 green pixels, 1 blue and 1 red. This patterning was designed by Bruce Bayer of Eastman Kodak in 1976. Weighting colour information towards green approximates ‘human photopic vision where the M and L cones combine to produce a bias in the green spectral region’ (Bull, 2014). As each pixel only records one of the three primary colours, 4 the image requires demosaicing (also referred to as debayering), whereby an algorithm interpolates the colour values of neighbouring pixels in order to remove the pattern generated by the Bayer filter, so the digital image renders colour in a way that emulates human vision. This process can either occur in camera, when sensor data are compressed into an 8-bit JPEG file, or if the raw image file is preserved, demosaicing is performed by software such as Lightroom or Photoshop. In either case, this exemplifies how digital photography is fundamentally algorithmic; users never see digital images that are not algorithmically manipulated as they would not conform to long-standing naturalistic social conventions. Figure 3 is a digital image that is displayed before and after demosaicing. The image also appears significantly underexposed because it has not had a contrast curve applied. Due to the high spatial resolution of the image (>20 megapixels), Figure 4 is a cropped area of the pre-debayered image, allowing the checkerboard bayer patterning to be seen.

RAW image after/before demosaicing.

Crop of pre-demosaiced image to illustrate bayer patterning.
Demosaicing algorithms affect images before they are viewed and are not typically user adjustable, so they function invisibly as part of the black-boxed apparatus of the camera-system. Picture styles, by contrast, are more visible elements of digital image signal processing because users are presented with numerous options within the camera’s operation menu and/or postproduction software. Typically, options include ‘standard’, ‘landscape’ and ‘portrait’ profiles, where algorithms control the contrast curve, relative colour, hue and saturation levels, and the amount of sharpening applied to the image. The picture style affects in-camera image previews and is ‘baked in’ to images saved using lossy, compressed formats. However, if the image is saved as a raw file, the picture style can effectively be altered in postproduction. Nevertheless, a picture style must be applied for images to conform to the conventions of naturalism, so software such as Adobe Photoshop requires a picture style to be employed.
Debayering and picture styles illustrate that digital photography has always required algorithms and computation: A digital camera is not simply a passive recording device. It doesn’t take pictures; it makes them. The sensor array intercepts a pattern of illumination, just as film used to do, but that’s only the start of the process that creates the image. (Hayes, 2008: 94)
As Brian Hayes presciently noted, a decade ago algorithmic processing in digital photography was unerringly directed towards approximating analogue photography, however, with enhanced computational processing capacity (such as that found on modern smartphones), digital signal processing can produce different forms of practice. This is precisely what we see in contemporary moves towards computational photography. While somewhat less radical than revolutionary claims associated with the marketing of computational photography, nevertheless, this demarcates important operational differences that produces novel problematics surrounding both representational strategies and the non-representational ecology of computational images. In the following section, I explore these problematics, focussing on computational photography’s relationship to a datafied gaze and the social context of platform capitalism.
The datafied gaze of platform capitalism
The intensification of automation associated with computational photography correlates with broader technocultural shifts. Computational technologies have diffused to the point that most humans carry powerful mobile computers that are constantly connected to a vast planetary network of data centres, fibre-optic cables, satellites, cellular towers and social media platforms. As these devices have proliferated, particular logics associated with them have become increasingly integral to culture and society. Computation is fundamentally predicated upon numerical representation, a mathematical form of quantification (Manovich, 2000). As numerous social theorists have argued, the ability to quantitatively distinguish between actors and outcomes is fundamental to extending the domain of competition and markets (Beer, 2016; Dean, 2009); if outcomes cannot be quantified, competition cannot be effectively measured. Consequently, the expansion of markets and commodification, which is commonly understood as being pivotal to material practices of neoliberalism, requires an immense apparatus of digital technologies (Harvey, 2005).
The importance of digital technologies to contemporary socio-economic systems has seen a proliferation of framings and analyses such as the information society (Castells, 1996), communicative capitalism (Dean, 2009), platform capitalism (Srnicek, 2016), computational capitalism (Stiegler, 2019) and surveillance capitalism (Zuboff, 2019). In differentiated ways, these frameworks emphasise how digital data and the networked computational infrastructures that underpin them are key to grasping socio-economic changes that have roots in the introduction of personal computers in the early 1980s and the web in the 1990s, but which have accelerated and substantively evolved since the proliferation of social media platforms and smartphones in the early 2000s. Two features of the Pixel 3 which resonate with these social frameworks are the implementation of automated facial retouching and Google’s image recognition Lens app.
While digital photography has always required algorithmic processing, computational photography alters the type of algorithm involved in image processing. Rather than acting uniformly, or upon quantitative pixel-level thresholds, as is the case with debayering or affecting the contrast curve, the key change surrounds the implementation of machine learning. As Google’s Yael Knaan (2017) emphasises, ‘Instead of just treating each pixel as a pixel, we try to understand what it is’. While machine learning is currently subject to vast quantities of industry-driven hype and mystification, particularly through its association with artificial intelligence, it can be succinctly defined as ‘the process by way of which algorithms are taught to recognise patterns in the world through automated analysis of very large datasets’ (Greenfield, 2017: 216). Machine learning involves the formation of a statistical model for pattern recognition based on training data. The model is not explicitly programmed, the system ‘learns’ based upon statistical inferences from training data. That data can be explicitly classified or labelled in advance by humans, which is known as supervised machine learning, or the algorithms can find associations in the data without prior labelling, which referred to as unsupervised learning. Not all machine learning algorithms are alike (MacKenzie, 2017), however, the larger the dataset used to train the model, the more accurate it will typically be, so machine learning tends to predominantly benefit entities who have access to the largest volumes of data. Google provides an emblematic example, as their web search data, locational data and billions of photos taken on Android smartphones, voice data from Google assistant and other modes of data capture present an immense volume of data. In line with the logic of data-driven network effects associated with platform capitalism, machine learning tends to centralise power in the hands of a corporate oligopoly (Srnicek, 2016: 45).
In the case of computational photography, machine learning is employed to recognise human faces. This enables the camera to identify which elements of the frame should be blurred by portrait mode, brightened by the synthetic fill flash and which frames Top Shot recommends. Furthermore, facial recognition affords the implementation of automated facial retouching, what is often termed ‘beauty mode’. Facial retouching involves the camera recognising human faces and applying an algorithm to ‘smooth’ skin, through adding blur and removing chrominance or luminance differences that suggest ‘blemishes’ such as spots, freckles, blackheads or scars. Since the early 1990s, facial retouching has been associated with Adobe Photoshop, which is routinely used to retouch images. However, within Photoshop, human editors take numerous decisions over whether to soften skin, to what extent, which ‘blemishes’ (if any) to remove or reduce in prominence and whether to sharpen, whiten and brighten eyes and teeth.
With the introduction of computational photography, the in-built camera application on smartphones automates these processes, often in ways that are functionally invisible to end-users. While the Pixel 3 has facial retouching enabled by default, it can be turned off by accessing a menu that offers options of: off, ‘natural’ and soft. Although we should note the discursive framing whereby the default option, which involves facial retouching is described as ‘natural’, the option to disable automated facial retouching is not present on some smartphones. For example, the iPhone XS Max has been criticised for featuring always-on, aggressively smoothed facial retouching (Hilsentenger, 2018). These technical filters are highly influenced by cultural filters (Rettberg, 2014), in this case, machine learning techniques generate an idealised appearance of human skin which departs from the kind of indexical representation that is often assumed to occur with photography.
In a related exploration of beauty apps that employ smartphone cameras to retouch or remodel the user, Ana Elias and Rosalind Gill (2018) argue these practices denote, ‘a particularly powerful example of the intensified surveillance of women’s bodies, whereby the ever more fine-grained, metricized and forensic scrutiny of the female body is increasingly mediated by the mobile phone’ (p. 60). Here, we see important connections drawn between the image processing functionality of contemporary smartphones, the forms of quantification and commodification associated with neoliberalism, and the practices of datafication and surveillance associated with platform capitalism. Indeed, the ability to ‘know’ and govern the self through surveillance, tracking and metricisation positions computational photography within the same biopolitical assemblage as wearable and mobile self-tracking devices (Lupton, 2016; Neff and Nafus, 2016). As Deborah Lupton (2012) demonstrates, this field is defined by technologies which address ‘idealised entrepreneurial consumers who are amenable to the monitoring, surveillance and disciplining of their bodies’ (p. 241).
In the case of facial retouching algorithms, the ideal of human skin as a wrinkle and blemish-free surface that is uniform in colour and almost entirely lacking in texture is articulated and fetishised by the algorithms built into contemporary smartphones. Such idealised constructions are not, however, unproblematic truths about the human condition, they idealise skin associated with youthfulness and femininity. When retouching an old, bearded man, a photographer is equally likely to emphasise wrinkles and blemishes as signs of wisdom and experience as to obliterate these features using smoothing and healing techniques. This situated decision-making process is automated out of existence by computational photography. Consequently, we begin to see the idealised user assumed and constructed by the technocultural assemblage of computational photography, and how this connects to the multi-billion-dollar beauty industry that sells a specific vision of what it means to be beautiful. Elias and Gill (2018) conclude that beauty and retouching apps should be understood as, ‘a technology of neoliberalism par excellence in purporting to offer neutral, scientifically based evaluations and “assistance” in beauty projects through apps that do not simply judge but “measure” and rate’ (p. 74).
It is crucial for critical perspectives on computational photography to foreground that these allegedly neutral evaluations and algorithmic improvements to human appearances are, of course, socially and culturally constructed assessments that reify certain representations while discriminating against others. In this case, we see how facial retouching algorithms resonate with broader cultural discourses which objectify the appearance of users, especially those of young women, and situate them as entrepreneurial subjects and aesthetic labourers (Elias et al., 2017) whose unmediated appearance is destined to be judged as inferior to the automatically retouched digital representations of themselves and their peers that circulate on social media.
An important shift in the usage of images and self-representations is implicated here. Whereas in the past, photographs were ways of marking occasions and events that would be periodically reviewed by family and close friends, selfies and smartphone photography are designed to circulate on corporate social media platforms, where they are markers of branded, entrepreneurial selves engaged in competition for the attention of peers (Bucher, 2012; Marwick, 2015). While users frequently contest this dominant structural logic of neoliberalism through various practices of resistance, the hegemonic form of governmentality impels users ‘toward networking/sociality, popularity, visibility, and self-display in a conjunctural moment where the “required” biographical project of self-realization cannot be disentangled from the branded self’ (Goodwin et al., 2016: 11). Self-representational images have moved from markers of the private self through time, towards establishing social capital within a highly commodified attention economy.
Furthermore, computational photography involves changes to photography as a system. Today, the camera-cloud-corporation assemblage is a key component of the system of datafication that sees increasing volumes of everyday life recorded as numerically represented and algorithmically manipulatable information that is subsequently employed to make predictions and target interventions designed to influence behaviours. Services like Google Photos automatically upload images to Google servers where machine learning algorithms identify the people, places and objects they contain. These data are used to group images into albums, but also is part of Google’s data platform; it is another dataset which Google employs to know its users and target them with relevant advertising. Indeed, alongside locational data from the phone, having categorised the places, people and phenomena that users are interested in through their everyday photographic practices is a veritable treasure-chest of personal data for an advertising platform such as Google; the platform not only learns where you were, but also who you were with and what you were doing.
Datafication is also central to Google’s Lens application, which is integrated into the camera functionality of the Pixel 3. Whereas automated facial retouching ties computational photography to forms of representational critique, Google Lens is more concerned with a form of dataveillance-based performative or non-representational critique. Lens is a machine learning-based system that attempts to recognise objects, either in already-taken images or seen live through the smartphone camera and then provides real-time search data on that object. In promotional materials, Google engineers demonstrate how Lens allows users to search what they see by tapping on-screen to search for clothing and fashion accessories. Searching through Lens opens web pages featuring those items in Google’s Chrome browser, thereby allowing users to quickly purchase objects they see in the world around them using their phones and Google’s platform.
This corresponds to an acceleration in the speed and ease of consumption, as users can employ Lens to search for items in the world around them, be directed to places to purchase them and do so (presumably using Google Pay, which allows purchases to be made from sites, apps and stores using a Google account) at the time and place where the desired object is encountered. Equally though, Lens allows Google to further compile data, both relating to the specific interests of the device user through the objects that they employ Lens to search for, but also relating to what objects are found where and when. Effectively, Lens is a tool that enhances Google’s ability to build dynamic maps of specific places using freely provided labour of users, thereby drawing parallel with previous Google endeavours such as Trekker (Kuehn and Daubs, 2017) and Ingress (Hulsey and Reeves, 2014). As with these related cases, users agree to forms of surveillance which provides Google with valuable data about the world and themselves in order to accrue benefits surrounding pleasurable consumption.
Cameras were conceptualised by 20th-century theorists such as Vertov and McLuhan as extensions of the human eye and human memory, as extensions of the human photographer. By contrast, today smartphone cameras function as tools that enable corporations to survey, analyse and intervene in users’ lives; although you still see through the camera, now the camera is watching you. Consequently, Martin Lister (2016) has argued for inverting the concept of extension, so that ‘rather than the camera extending the photographer, the photographer has become an extension of the camera’ (p. 272), foregrounding the alteration of agency that occurs when the user becomes the object of the datafied gaze of computational photography. However, perhaps this re-envisioning should go beyond cameras and photographers; the computational camera is a just one facet of the immense cyborgian assemblage of Google’s platform. This network of infrastructure, devices, code, programmers and users forms a transnational ecosystem predicated upon the dataveillance of human users in order to predict and modify their actions in the ultimate pursuit of economic profit. Within the current conjuncture of planetary ecological crisis, however, it would be remiss to omit mention of the glaring incompatibilities between the profit- and ultimately fossil-fuelled assemblage of platform capitalism and an equitable future for human and nonhuman life.
Conclusion
Computational photography remediates earlier forms of digital photography, allowing smartphone photography to produce images that emulate aesthetics associated with physically larger cameras, such as increasing dynamic range, enhancing low-light performance and simulating shallow depth-of-field. Relatedly, techniques associated with professional photographic workflows, such as using fill lights or lighting modifiers to brighten human faces and facial retouching are also performed by machine learning algorithms. Whereas earlier forms of digital photography remediated analogue photography, appropriating language and concepts such as masking, dodging and burning, computational photography performs a similar process by remediating digital photography.
One key difference associated with computational photography is that in place of a single image whose moment of capture is based upon a human actuating a shutter mechanism, each image is a composite, with elements of the recording process preceding human intervention. While the history of compositing provides a useful context that demarcates that photography has, almost since its inception included images that combined multiple frames, the temporal ontology of the images and the extent to which this black-boxed process is automated indicate notable shifts in the production of images. Although digital images have always been algorithmically manipulated, computational photography alters the way that images are algorithmically affected; machine learning-based processes do not uniformly act upon images, or select pixels based upon luminance or chrominance levels, but instead seek to identify what those pixels represent, enabling regions such as human faces to be processed differently to surrounding areas.
Likewise, the automation of photographic image production associated with computational photography is not strictly speaking new; the history of photographic technology has seen the gradual automation of a medium which began with manual-only controls for aperture, shutter speed, film speed/iso, focus, light metering (Cubitt et al., 2015). Computational photography extends this process to automate compositing, depth-mapping, synthetic fill-flash and facial retouching, with Top Shot even providing automated assistance to select alternative frames within the image stream which provide normatively ‘superior’ photographs. The extent to which computational photography automates image production raises questions surrounding the redistribution of agencies at play within the camera-operator assemblage; to what extent is the photographer still the creative agent responsible for these images when the vast majority of decision-making is undertaken by the camera?
Computational photography cannot be understood solely in relation to smartphones though. Increasing levels of automation and incorporating machine learning exemplify wider technocultural trends surrounding digital technologies. However, these practices are never merely technological; computational photography epitomises wider social patterns that situate users as branded selves subject to datafication within the context of an attention economy. Automated facial retouching modes in computational photography, particularly those which cannot be disabled by the user, signal the gendered and aged biases built into the idealised user of these devices. Furthermore, computational photography is entangled with the cultural and economic logics of platform capitalism. While the computational camera extends the photographer it also extends the dataveillance of photographers by multibillion-dollar technology corporations.
