Abstract
One of the most important things people see is what other people do. In photographs of actions, people see what other people have done. This analysis focuses on photographs of motor actions or interactions taken in naturally occurring situations. I suggest that such photographs represent special meanings, which I call action-related meanings. I examined the hypothesis that viewers understand these meanings by establishing motor and somatosensory neural representations of pictured actions, which would also be activated if viewers would actually perform these actions. This correspondence provides a special access to bodily meanings of pictured actions. Based on findings on vision and reactions to photographs from multiple research areas, I developed a novel framework that describes the neural basis of understanding action-related meanings of photographs; how these meanings differ from conceptual meanings; the characteristics of pictured actions, which influence the strength of motor and somatosensory responses; the processes making these responses accessible to conscious experiencing; and the potential emotional, social, and cultural value of photographs picturing actions. The proposed framework contains a number of predictions, which can be tested by future empirical investigations. The analysis aims to contribute to a better understanding of the meanings represented by photographs of actions.
Photographs are a ubiquitous phenomenon today (Carrington, 2020; Kemp, 2020; Statista, 2019). Photography enables people to easily record, receive, and communicate important information. My intention is to draw attention to the special characteristics of photographs depicting actions or interactions. What other people do or have done is one of the most important things that people see (Gazzola & Keysers, 2009), what they talk about (Dunbar, 2004), and what they think about (Spreng & Grady, 2010).
I examined the hypothesis that photographs depicting a goal-directed motor action or interaction represent a special class of meanings—which I refer to as action-related meanings—and that viewing photographs with these meanings conveys special abilities, which have particular value in certain contexts. This value would be related to the fact that such photographs provide experiences, which would be similar if viewers actually performed the pictured actions. Viewing photographs of actions could allow people to have desirable experiences to which they have no access, such as experiencing pleasant touch or exercising. In some areas of photography, these abilities play an important role, like in advertising, pornography, or journalistic reports of celebrations, sports, or fightings (Kislinger, 2018).
I hypothesized that viewers recognize and understand action-related meanings of photographs based on visual, motor, and somatosensory activations in the cerebral cortex, and that these activations form the neural basis of special experiences. I examined these hypotheses based on neuroscientific and psychological findings about the visual system of humans and/or primates and reactions to photographs from multiple research areas. Primarily, I referred to experimental studies in which researchers investigated brain activation patterns evoked by photographs or video clips of actions. In most cases, the researchers created special static or dynamic pictures for their experiments. For example, an acting body part (e.g., a hand grasping an object) is pictured against a neutral (white, gray, or black) background. In some studies, however, pictures were presented in which the acting individuals are embedded in a complete scene (e.g., Borgomaneri et al., 2012; Goldberg et al., 2014; Jackson et al., 2005; Moscatelli et al., 2011; Proverbio et al., 2009). The findings of these studies are consistent with the findings from the experiments in which the specially created photographs were used. This suggests that the findings using the specially created photographs are also informative for analyzing the specific processes and experiences that are evoked by viewing photographs, which people see in everyday life.
With regard to the visual system, I particularly referred to findings on the close connection between the visual system with the cortical motor system (Goodale & Milner, 1992; Rizzolatti & Matelli, 2003) and the somatosensory system (Dijkerman & De Haan, 2007; Gazzola & Keysers, 2009; Keysers et al., 2010, 2018; Meyer et al., 2011). This close connection is the basis of one of the most noticed neuroscientific discoveries in recent decades. Researchers found that there are neurons in the brain of humans and other primates that activate both when individuals perform a certain motor action and when they see another individual performing this action. Researchers called these neurons mirror neurons (Gallese et al., 1996; Rizzolatti & Craighero, 2004). When an individual sees or observes how another individual performs an action, mirror neurons in the observer’s brain build the motor activation pattern, which is presumably present in the acting individual. In this way, the observed action is processed in the brain of the observer as if the observer was performing the action herself or himself. The mirror neuron activities in the observer “vicariously” represent motor activations occurring in the observed individual (Keysers et al., 2010).
Researchers described such vicarious neural activation also for seeing somatosensory sensations in other individuals (for review, see Keysers et al., 2010, 2018). Vicarious somatosensory activations have been observed throughout the cortical somatosensory system (Dijkerman & De Haan, 2007). They convey a somatic component to the mere sight of pictured actions and sensations (Bolognini et al., 2011, 2013; Cheng et al., 2008; Gu & Han, 2007; Jackson et al., 2005; Meyer et al., 2011). Researchers assume that vicarious motor and somatosensory activations support the recognition and understanding of observed actions or sensations and facilitate interactions with others (Keysers et al., 2010; Rizzolatti et al., 2001).
I conjectured that viewing photographs of actions and the somatosensory sensations involved in these actions evokes vicarious motor and somatosensory activations—although photographs are two-dimensional, static pictures, whereas actions seen in everyday life are three-dimensional and moving stimuli. I do not know any empirical study, which compared the brain reactions of subjects viewing an action that happens in real life with the brain reactions of subjects viewing a photograph of this action. The relationships between reactions to pictured and real actions must, therefore, be theoretically determined. Findings that are useful for this determination come mainly from experiments in which actions were represented by video clips (Buccino et al., 2001; Chong et al., 2008; Dinstein et al., 2007; Gazzola & Keysers, 2009; Goldberg et al., 2014; Kilner et al., 2009).
Unlike photographs, video clips provide visual and maybe auditory information about the course of movements and thus a more complete representation of actions. The information about an action provided by photographs is largely incomplete. A photograph represents a pictured action only by providing information about the visual appearance of the action in a single moment. Nevertheless, some researchers observed that photographs of moving individuals elicited activations in visual areas that play an important role in seeing real motion (Allison et al., 2000; Borgomaneri et al., 2012; Kourtzi & Kanwisher, 2000). There are also findings on the activation of mirror neurons by photographs of actions (Hermsdörfer et al., 2001; Johnson-Frey et al., 2003; Proverbio et al., 2009; Urgesi et al., 2006, 2010). A few studies observed responses to photographs of actions in somatosensory brain structures. Bolognini et al. (2013), for example, observed that photographs picturing touch led to activation of the primary somatosensory cortex (S1). Somatosensory responses have also been observed in some studies presenting photographs of actions associated with painful experiences (Cheng et al., 2008; Gu & Han, 2007; Jackson et al., 2005), or with sexual arousal (Bühler et al., 2008; Redouté et al., 2000; Wehrum et al., 2013; but also Ferretti et al., 2005).
Not much is known about whether and how motor and somatosensory brain activation patterns evoked by viewing photographs of actions correlate with conscious experiences such that viewers experience what it feels like to perform a pictured action. Clues of such a correlation can be found in studies examining reactions to photographs depicting painful or sexual actions (Bühler et al., 2008; Gu & Han, 2007; Jackson et al., 2005; Wehrum et al., 2013). I referred to findings that suggest that bodily experiences of pictured actions are primarily conveyed through the somatosensory processes involved (Blanke, 2012; Maravita et al., 2003). Processes in structures that control responses to the relevance or emotional value of pictured actions presumably also play an important role (Barrett & Bar, 2009; Craig, 2009).
I examined the following research questions:
In addressing these questions, the word meaning plays a central role. I define meaning as information that humans or animals react to with certain cognitive processes. Meaning is information that elicits reactions of a certain quality and quantity in cognitive systems due to the knowledge they retain. In this way, I avoid reducing meaning to the property “potential content of conceptual recognition and understanding.” Precise definitions of the terms action-related meaning and conceptual meaning are established step by step and are presented in the section “Do Photographs of Actions Represent a Separate Class of Meanings?”
The aim of the present analysis was to create a hypothetical framework, which describes the specific meanings represented by photographs of motor actions and the neural basis of these meanings. Important components of the framework correspond to hypothetical predictions that can be tested empirically by future research. The central prediction is that photographs of actions with specific characteristics represent a separate class of meanings, which is different from conceptual meanings.
Elements of Mental Representations of Motor Actions
Representation is a crucial concept in cognitive neuroscience. In neuroscience, it is assumed that a memory content is expressed by an activity pattern of a specific population of neurons, which carries an engram. This engram represents the memory content (for reviews, see Josselyn et al., 2015; Tonegawa et al., 2015). A neural representation is a pattern built by neurons that, due to its spatial and temporal organization, stands for a sensation, movement, experience, action, goal, and so on. The concept of neural representation is controversial (e.g., Bennett & Hacker, 2003). Through new technologies, however, the processing of neural representations has become observable, controllable, and manipulable (Josselyn et al., 2015).
Motor actions are complex behaviors that include different elements (Grafton & Hamilton, 2007). An action is a sequence of movements of body parts on certain spatial trajectories with certain time courses, which allows individuals to reach a certain goal. Based on their past visual, motor, and somatosensory experiences with actions (Hardwick et al., 2013; Vidoni et al., 2010), individuals have a motor knowledge or memory that contains a repertoire of mental representations of actions (Rizzolatti et al., 2001, 2014). These representations comprise the central elements of an action: transformations of sensory information into motor commands; movements; proprioceptive, somatosensory, visual, and auditory consequences of movements; the goal of the action; and the desired sensory outcome of the action (Gazzola & Keysers, 2009; Grafton & Hamilton, 2007; Wolpert & Ghahramani, 2000).
Mental representations of actions that are part of the motor knowledge of individuals enable them to respond to sensory inputs with appropriate actions, to perform these actions effectively and efficiently, and to prepare and plan actions to achieve more distant goals. Representations of actions support the decision whether an action should be carried out or not (Gold & Shadlen, 2007) and are involved in recognizing and understanding the actions observed in other individuals (Rizzolatti & Luppino, 2001; Urgesi et al., 2010).
Goals of actions are related to motives, needs, wishes, and have a certain relevance or value. Representations of motor actions, therefore, include an emotional component. The term emotion denotes a reaction to an object or event that is important for individuals and requires them to prepare an appropriate action (Bradley et al., 2001; Damasio, 2010; Goldberg et al., 2014; Lang et al., 1997). Some researchers related emotional responses to two basic motivational groups: one group being withdrawal, avoidance, and defense; the other being approach and appetitive behavior (Bradley et al., 2001; Lang et al., 1997). For example, a threatening event can elicit the emotion fear and with it the preparation for flight. An appetizing meal can evoke joy and the tendency to approach and eat the meal. An object or event has an emotional meaning for individuals if they associate it with a positive or negative value in relation to their needs, desires, or goals, and if it elicits a certain motivational state (Bradley et al., 2001). The basic and close connection between motor and emotional reactions is evident in brain structures and activities in which motor and emotional processing overlaps (Craig, 2009; Pourtois et al., 2013), for example, in activities of dopamine neurons (Caligiore et al., 2013; Friston et al., 2012; Yao et al., 2016).
Researchers usually used video clips as stimuli when they studied the neural mechanisms on the basis of which individuals associate visual stimuli with representations of actions (Buccino et al., 2001; Chong et al., 2008; Dinstein et al., 2007; Gazzola & Keysers, 2009; Goldberg et al., 2014; Kilner et al., 2009). Video clips contain visual information that changes over time and thus information about movement. They may also contain synchronized auditory information. These characteristics have implications for the recognizability of the pictured individuals or objects. Bovet and Vauclair (2000) concluded in their review titled Picture Recognition in Animals and Humans that color video clips are more easily recognized than color photographs, which in turn are more easily recognized than black-and-white photographs. A comprehensive examination of the differences between photographs and video clips would go beyond the scope of the present article. In connection with the aims of the present analysis that addresses the processing of visual information, it is sufficient to see the fundamental difference in the fact that a photograph is a static single picture that represents the pictured action implicitly by information about its visuospatial appearance in a single moment, whereas a video clip contains explicit information about the spatiotemporal course of the action. Actions pictured in black-and-white photographs can be recognized when the main properties of the actions are represented by characteristic shape information.
The Neural Basis of Action-Related Meanings Represented by Photographs
The word photograph denotes an external, permanent, static picture that has been taken using a camera. In it, the light that came from a scene was projected onto a light-sensitive surface and the projection pattern fixed. Photographs consist of small components (pixel or film grain; Keelan, 2002). Important features of these components are number, size, brightness (intensity information), and color (chromatic information). Cameras can produce representational pictures with specific characteristics, for example, pictures with high detail resolution. The use of wide-angle or telephoto lenses can generate a special representation of the spatial layout of a scene (Cooper et al., 2012). Cameras can make pictures of moving individuals in which the visual appearance of these individuals can be seen in a 1,000th of a second or less, or in which the movement of individuals is represented by motion blur.
A photograph can share a large number of stimulus features with the object that has been recorded (Bovet & Vauclair, 2000; Bradley & Lang, 2007; DeLoache et al., 1998). When individuals see a photograph of an object, a retinal image can be formed, which is similar to the image that would be formed if they saw the corresponding real object in the environment (Perrett et al., 1991). This may be one of the reasons why people tend to understand photographs as truthful representations of real events (Gu & Han, 2007; Miller, 1973). I will determine more specific properties of photographs by comparing photographs with video clips and words in sections “Do Photographs of Actions Represent a Separate Class of Meanings?” and “Do Photographs of Actions Have a Special Value in Certain Contexts?”
In the following, I describe the steps of processing action-related meanings of photographs in the brain. In a first step, three-dimensional spatial properties are detected in the visual information provided by the photograph. In the next step, visual information is perceived as the movement of another individual. Based on motor and somatosensory neural activations, the visual information is transformed into an increasingly complex representation of an action. Somatosensory and emotional components of the represented action enable viewers to consciously experience the represented action.
Real actions that are seen in the environment have certain spatial properties. The human visual system works with a number of processing methods that use different properties of visual inputs to recognize and represent the spatial properties of objects and scenes (Cutting & Vishton, 1995). Important processing methods also work when perceiving two-dimensional images. Cutting and Vishton name, for example, occlusion, relative size, and relative density. Even when seeing images, coarse recognition of simple spatial objects with certain spatial orientations and their differentiation from a background occurs at an early stage of visual processing (Rensink, 2000).
The fact that two-dimensional, static visual stimuli can represent movements and actions is based on the close connection that exists in the cortex between the visual system and the motor system, that is, the structures in the cerebral cortex that serve the control of the movements. In 1992, Goodale and Milner introduced a new neuroscientific model of vision that conceptualized this close connection for the first time. They referred to a theory, which states that in the cortex of primates, visual inputs are processed in two specialized visual streams, a ventral and a dorsal stream (Ungerleider & Mishkin, 1982). According to Goodale and Milner (1992), the processes in the ventral visual stream (VVS) convey the conceptual recognition of stimuli (i.e., object recognition) that is consciously accessible and reportable. The processes in the dorsal visual stream (DVS) convey the recognition of the spatial properties and shape characteristics of stimuli. This allows the individual to move eyes, head, and hands appropriately. Processing of a visual stimulus in DVS takes place if a stimulus is associated with an action. This is the case, for example, if an individual sees a familiar graspable object, for example, a coffee cup. VVS leads from the occipital cortex into the inferior temporal lobe (IT; see Figure 1A), whereas DVS leads from the occipital cortex into the posterior parietal cortex. VVS and DVS are connected through various links (Allison et al., 2000; Binkofski & Buxbaum, 2013; Milner & Goodale, 2006; Rizzolatti & Matelli, 2003), and areas of both streams project into the prefrontal cortex (Wilson et al., 1993).

Schematic drawings of the brain regions, projections, and pathways thought to be important for recognizing and understanding action-related meanings of photographs.
Goodale and Milner (1992) proposed that processing in DVS serves the immediate visual control of motor behavior when interacting with an object. If processes in DVS only had this one function, they would be irrelevant in seeing photographs, because pictured objects are not really present in the environment and no direct interaction is possible with them. Several researchers, however, described properties of processes in DVS that may also be relevant to understanding the meanings represented by photographs of actions.
Rizzolatti and Matelli (2003), for example, suggested that DVS consists of two substreams, a ventral and dorsal substream. Figure 1A shows a schematic lateral view of their course with their extensions into premotor areas. According to Rizzolatti and Matelli, the dorsal substream of DVS serves the function that Goodale and Milner (1992) generally ascribed to DVS, but the ventral substream is an interface between visual object recognition, as processed in VVS, and the immediate visual control of action. Processes in the ventral substream are involved when individuals see three-dimensional objects that they associate with a possible action or grasp them, when individuals associate visual inputs with longer term organization of actions, and when individuals observe other individuals performing a motor action. According to Rizzolatti and Matelli, an important function of processing in the ventral substream is that it conveys an understanding of actions that are seen in others, which is based on activities of special neurons that transform visual information into motor and somatosensory information. There is experimental evidence that photographs of actions elicit activations in areas of this ventral substream (Hermsdörfer et al., 2001; Kourtzi & Kanwisher, 2000; Proverbio et al., 2009; Urgesi et al., 2010). I describe structures and processes in the ventral substream of DVS and its continuation into the ventral premotor cortex (vPMC) as the neuronal basis of action-related meanings represented by photographs.
Processing Centers in the Brain
Rizzolatti and Matelli (2003) described the middle temporal area (MT) as the entry node of the ventral substream of the DVS in the visual cortex. MT is involved in the perception of movement. MT and the superior temporal sulcus (STS) are the main processing centers for the visual analysis of the movement of humans or animals (Allison et al., 2000; Perrett et al., 1991; Rizzolatti et al., 2001; Urgesi et al., 2014).
Photographs of moving individuals can carry information that is processed as movement by neurons in MT (Kourtzi & Kanwisher, 2000; Moscatelli et al., 2011; Proverbio et al., 2009), STS (Allison et al., 2000), and parts of the parietal cortex (Hermsdörfer et al., 2001; Urgesi et al., 2010), that is, such photographs evoke the same reactions in the areas mentioned as real moving stimuli. This reaction is presumably based on the fact that photographs of motor actions implicitly contain information about the position of the pictured individuals or their body parts immediately before and after the pictures were taken (see the three example photographs in Figure 2). Such implicit information about movement in photographs is referred to in the research literature as implied motion (Kourtzi & Kanwisher, 2000; Moscatelli et al., 2011; Urgesi et al., 2006). See Figure 2 (top and middle panel) for example photographs.

Example photographs representing action-related meanings.
Urgesi and colleagues (2010) relate the processing of photographs through visual areas that analyze effective motion to the fact that seeing motion in natural environments is often impaired by obstacles. Individuals often only receive incomplete information about movements and actions. For this reason, neural mechanisms have developed in the brain that complete such partial information to represent a full movement or action. This completion mechanism also is activated by information coming from photographs. According to Urgesi et al., an important property of the resulting processing is its anticipatory component. It allows individuals to predict how the pictured movement will continue. Urgesi et al. assume that pictured individuals are processed as moving especially when the actions they perform are still in progress, that is, have not yet reached their final state. This is the case, for example, if the hands a woman stretches out to catch a ball are not yet holding it, as shown in the top panel of Figure 2.
MT sends projections to STS, which also receives afferents from the inferior temporal lobe (IT) and forms an important link between VVS and DVS (Perrett et al., 1991; Rizzolatti & Matelli, 2003). Links between VVS and DVS are important in terms of how viewers recognize what is pictured in the photograph of an action. There is still no consensus in research regarding the mechanisms by which stimuli are recognized that are processed in DVS. Some researchers assume that a stimulus must first be conceptually categorized by processing in VVS, and only then processing in DVS is initiated in a top-down manner (Barrett & Bar, 2009; Milner & Goodale, 2006). Other researchers assume a direct stimulus-induced transformation of visual input into a meaningful motor representation (Rizzolatti et al., 2014; Ubaldi et al., 2015).
In recognizing the elements of pictured actions, the extrastriate body area (EBA) also plays an important role (Weiner & Grill-Spector, 2011; Zimmermann et al., 2018). EBA is a region in the occipitotemporal cortex that reacts selectively to images of human bodies. EBA provides another link between recognizing a stimulus through processes in VVS and its processing in DVS. The activities in EBA, MT, and STS presumably convey a quick, coarse recognition of what is seen in a picture (Weiner & Grill-Spector, 2011; Zimmermann et al., 2018): There is a body of an individual who is moving in a certain direction.
The decision whether and how intensely a visual input is processed in the ventral substream of DVS and its continuation in the premotor cortex is influenced by the potential relevance or emotional value of the stimulus (Barrett & Bar, 2009; Borgomaneri et al., 2012; Goldberg et al., 2014). Regions, projections, and processing pathways, which are assumed to be involved in processing the emotional components of action-related meanings of photographs, are schematically shown in Figure 1B. Two structures play a central role, which activate early after the appearance of a visual stimulus: the orbitofrontal cortex (OFC) and the amygdala. Both play an important role in recognizing value signals and emotional meanings of stimuli (McNamee et al., 2013; Rolls, 2004) and in the selection of relevant stimulus information (Barrett & Bar, 2009; Pessoa & Adolphs, 2010). Information about the value of an action gets to the medial orbitofrontal cortex (mOFC) from the inferior temporal lobe (IT), mOFC then sends the value-related information to MT and the inferior parietal lobule (IPL), which leads to increased responses in these areas (Barrett & Bar, 2009).
The amygdala is a complex structure in the anterior medial part of the temporal lobe. It is the central brain structure in terms of the close connection between vision and emotion and generally one of the most important centers in the human brain that are involved in the processing of emotions (Pessoa & Adolphs, 2010; Pourtois et al., 2013). With regard to action-related meanings, projections from the amygdala to STS play an important role (Allison et al., 2000; Rizzolatti et al., 2001). STS sends information back to the amygdala. Via this connection, Amygdala signals can make action-related stimuli particularly salient so that the individual quickly and involuntarily directs attention to them.
Transforming Visual Information Into Motor and Somatosensory Information
Neurons in EBA, MT, and STS signal that there is an image of a moving person and send this information to IPL. In addition to visual neurons, IPL contains various other types of neurons, which play an important role in processing action-related meanings of photographs: visuomotor neurons and mirror neurons (Gazzola & Keysers, 2009; Rizzolatti & Craighero, 2004), somatosensory, and bimodal visual–somatosensory neurons (Dijkerman & De Haan, 2007; Lewis & Van Essen, 2000; Rizzolatti & Luppino, 2001).
Visuomotor neurons activate both in motor interaction with a graspable object and in merely seeing a three-dimensional object that is associated with a possible action (Rizzolatti & Luppino, 2001). They produce the same signal when the individual only sees an object or when the individual interacts with the object. If visuomotor neurons are activated by merely looking at an object, the visual information is thus transformed into motor information, that is, into neural activity related to movements or motor actions (Rizzolatti & Luppino, 2001). This transformation supports the ability to interact quickly and efficiently with objects that are seen. Visual mirror neurons are a special type of visuomotor neurons. Mirror neurons are activated both in the execution of an action and in the observation of the action. Bimodal visual–somatosensory neurons respond to both visual and somatosensory, especially tactile stimuli and can transform visual to somatosensory information.
The ability to transform visual information into motor or somatosensory information is presumably learned by interactions with other individuals and objects (Gazzola & Keysers, 2009; Hamano et al., 2019). From early childhood, people learn to associate visual inputs with motor actions and somatosensory sensations. The processes involved can be explained on the basis of associative learning involving visual, motor, and somatosensory processes. Associative learning takes place when individuals perceive how two different events occur repeatedly in temporal proximity and realize that these events occur together reliably (Hebb, 1949). Associative motor learning in the course of interactions with other individuals and objects leads to the formation of cortical motor engrams (Hamano et al., 2019). If individuals repeatedly observe their hands during interactions with certain objects, such as a ball or the hands of one’s mother, the motor engrams become associated with visual engrams and gain visual properties by including visuomotor neurons. The objects then are visual retrieval cues that lead to the activation of the engrams and the representation of the actions carried by them.
IPL is one of the main nodes of the mirror neuron system in the human brain (Gazzola & Keysers, 2009; Rizzolatti & Craighero, 2004; Urgesi et al., 2010). The second main node is the vPMC, to which the ventral substream of DVS continues. The vPMC is part of the secondary motor cortex and performs functions in associating sensory input with possible actions, in preparing possible actions, in deciding whether to perform an action effectively, and in understanding actions that are seen in other individuals (Rizzolatti & Luppino, 2001). The primary motor cortex controls the fine-tuning of movements and actions that are to be performed effectively.
IPL and vPMC are closely and mutually connected with each other (Rizzolatti et al., 2014; Rizzolatti & Matelli, 2003). Processing in these regions establishes an increasingly complex representation of an observed action that is based on the activation of the same ensembles of neurons in motor cortices of the observers that would be activated if they were actually performing the action themselves (Rizzolatti et al., 2001). This vicarious motor activation provides a special access to a perceived action and the ability to understand this action. An important property of this ability to understand is that it allows anticipating the immediate further course of an observed action (Urgesi et al., 2010).
Perceiving Action-Related Meaning Consciously
I hypothesize that photographs of actions have a special value in certain contexts because viewing them elicits a bodily experience of the pictured actions. Bodily experiences are consciously perceived or felt (Craig, 2009; Damasio, 2010). The processing of observed actions in DVS and vPMC, which has been described above, is not necessarily associated with conscious perception. According to the model of Goodale and Milner (1992), processing of visual inputs in DVS does not at all lead to conscious experiences that individuals may report or think about, but only serves to effectively control motor actions. Different factors, however, can lead to the processing being accompanied by conscious awareness. I refer to two of these factors: first, to the output of structures that are involved in somatosensory processing (Blanke, 2012; Maravita et al., 2003); and second, to the involvement of structures that control responses to the relevance or emotional value of an observed action (Barrett & Bar, 2009).
Motor actions are associated with somatosensory sensations. They are related to proprioceptive, tactile, and possibly also visceroceptive processes in the S1 and secondary somatosensory cortex (S2), as well as in the insula and in the posterior parietal cortex (PPC; Craig, 2009; Vidoni et al., 2010; Wolpert & Ghahramani, 2000). Somatosensory processes are also involved in observing an action (Gazzola & Keysers, 2009; Keysers et al., 2010; Meyer et al., 2011). Several researchers reported activations in somatosensory cortex or insula of subjects viewing photographs. These activations corresponded to proprioception (Bolognini et al., 2011, with video-clips as stimuli), interoception (Bühler et al., 2008; Wehrum et al., 2013), touch (Bolognini et al., 2013), and pain (Cheng et al., 2008; Gu & Han, 2007; Jackson et al., 2005).
Dijkerman and De Haan (2007) described a ventral and a dorsal processing stream in the somatosensory system. The dorsal somatosensory stream leads from S1 to the PPC and primarily serves processing somatosensory information when performing motor actions. The ventral stream leads from S1 via S2 to the insula and conveys conscious experiences of somatosensory stimuli or processes. The ventral somatosensory stream overlaps with the ventral substream of DVS. Due to this overlap, somatosensory activations that have been elicited by viewed motor actions can be consciously experienced. Insula activities convey conscious feelings related to touch, pain, sexual arousal, cold, or heat (Craig, 2009; Dijkerman and De Haan, 2007; Wehrum et al., 2013).
The ventral somatosensory stream overlaps not only with the ventral substream of DVS but also with brain structures that play a central role in conscious emotional feelings. One of these structures is the anterior cingulate cortex (ACC; Gazzola & Keysers, 2009; Gold & Shadlen, 2007; Gu & Han, 2007; Keysers et al., 2010, 2018; Proverbio et al., 2009; Rizzolatti et al., 2014; Rizzolatti & Luppino, 2001). The cingular cortex is located inside the brain, medially between the two cerebral hemispheres, where it wraps around the dorsal side of the corpus callosum, which connects the two hemispheres of the brain. ACC plays an important role in the integration of sensory, motor, emotional, and motivational information (Lewis & Van Essen, 2000). ACC is also a central brain structure in deciding whether behavior is better processed automatically or with an involvement of conscious control (Bush et al., 2000). ACC has access to motor information in vPMC (Rizzolatti & Luppino, 2001), and is closely and mutually connected with the anterior insula (Bush et al., 2000; Craig, 2009; Damasio, 2010) and OFC (Barrett & Bar, 2009; Kringelbach, 2005). Activities in the lateral OFC integrate action-related meanings of visual input with visceroceptive information (Kringelbach, 2005; Rolls, 2004) and help to build a representation of an observed action that brings together somatic, emotional, and visual information (Barrett & Bar, 2009). OFC and ACC both project to the amygdala, and all of these structures are involved in building conscious, emotionally charged experiences of events (Pessoa & Adolphs, 2010; Pourtois et al., 2013). To my knowledge, however, there are almost no experimental studies that have examined whether and to what extent viewers process the sensations associated with actions depicted in photographs in somatosensory cortices, insula, ACC, OFC, and amygdala. A few related studies exist on responses to photographs of sexual behavior (Bühler et al., 2008; Wehrum et al., 2013) and actions associated with pain (Gu & Han, 2007; Jackson et al., 2005).
Do Photographs of Actions Represent a Separate Class of Meanings?
I define action-related meaning as a subset of the possible meanings represented by a photograph of an action that results from activities of mirror neurons, visuomotor, visuosomatosensory, and somatosensory neurons in the brain of a viewer. These neural activities lead to establishing a mental representation of the pictured action, which is composed of action elements that are retained in the motor and somatosensory memory of the viewer. Viewing the photograph evokes a motor and somatosensory neural activation pattern, which would be similar if the viewer actually performed the pictured action. This correspondence provides a special recognizing and understanding of the pictured action.
The notion that people recognize and understand something that they see in “the world ‘out there’” (Milner & Goodale, 2006, p. 63) due to visuomotor, motor, and somatosensory activities is contrary to the general view in neuroscience that the recognition of “what” is there in the environment is processed in IT and VVS. The recognition of stimuli through processes in VVS is related to conceptual categorization (DiCarlo et al., 2012; Freedman et al., 2001), that is, a stimulus is recognized by assigning a specific concept to it. A concept is a notion that represents a class of concrete or abstract objects, events, activities, relationships, or properties, for example, ball, woman, freedom, catch, or blue (Binder & Desai, 2011; Mahon & Caramazzo, 2008). The concept of catching, for example, applies to all catching activities. It is based on knowledge what all these activities have in common and makes them catching.
A concept is created through a process of generalization and abstraction. A concept is a generalized representation that includes only the most important characteristic features of the object or event. Concepts can be used to determine what objects or events there are in the environment, what their properties are, and how different objects or events are related to each other. In humans, concepts are stored in a semantic system (Binder & Desai, 2011; Binder et al., 2009) and connected with linguistic representations, that is, with knowledge of the meanings of words and sentences (Mahon & Caramazzo, 2008; Pulvermüller et al., 2009). The concepts that individuals know form their conceptual knowledge. Conceptual knowledge is knowledge about the world, which is abstracted and generalized from concrete experiences. The conceptual knowledge of individuals enables them to associate stimuli with conceptual meanings. Due to the connection of concepts with words, people can communicate conceptual meanings. The conceptual meanings of photographs correspond to the ideas, viewers associate with them on the basis of their conceptual knowledge.
Important concepts are related to actions. Such concepts are represented, for example, by the words “grasp,” “walk,” or “bite,” or the phrase “catching a ball.” Reading or hearing such words or phrases causes activations in motor brain areas (Aziz-Zadeh et al., 2006; Hauk et al., 2004; Tettamanti et al., 2005). The involvement of motor and sensorimotor brain activities in retrieving and processing the meanings of symbolic representations or ideas is referred to by some researchers as embodied cognition (Binder & Desai, 2011; Wilson, 2002; Wilson & Foglia, 2017). The term refers to the idea that the cognitive processing of concepts and symbolic representations associated with possible actions and sensory sensations is related to properties, activities, and states of the human body beyond the brain.
A photograph represents a concrete, singular action. This representation has characteristics that are fundamentally different from properties of a highly generalized conceptual representation of the action in question. A photograph provides detailed visual information about the elements of the pictured action, for example, the trajectories of body parts, time- and speed-related information, information about involved somatosensory sensations, or predictive information about the achievement or nonachievement of the goal of the action. Viewers understand this visual information by transforming it into a representation of an action that is contained in their motor and somatosensory memory (Rizzolatti et al., 2001).
A photograph of an action also contains information that can only be recognized and understood on the basis of conceptual knowledge. The individual, for example, who stretches out her hands in the top panel of Figure 2, is a 40-year-old woman and the round object she catches is a ball. There are interactions between processing the conceptual and the action-related information of photographs. The conceptual knowledge of viewers about the world fundamentally influences which meanings viewers associate with photographs (Amit et al., 2009; Deregowski, 1989; Miller, 1973; Nisbett & Masuda, 2003). It also influences which experience, intention, and goal they associate with a pictured person or action (Molenberghs et al., 2013; Olivola & Todorov, 2010). The visual information of a photograph of an action, hence, is usually processed in parallel in the inferior temporal lobe (IT) and in parietal and frontal premotor areas. The resulting action-related and conceptual meanings are integrated through processes in OFC, the anterior insula, ACC, and PPC (Barrett & Bar, 2009; Gazzola & Keysers, 2009).
In this close connection with conceptual meanings, action-related meanings clearly form an independent class of meanings. It is the meanings that are based on activities of neurons in IPL and vPMC. These activities associate the visual information of the photograph with representations that are stored in the motor and somatosensory memory of the viewers and are closely connected to their bodies.
Characteristics of Action-Related Meanings Represented by Photographs
A number of characteristics of photographs picturing actions influence the strength of the responses of mirror neurons, visuomotor and somatosensory neurons in viewers, and the intensity of the bodily experience conveyed by the photographs. A central characteristic is related to the visibility of the elements of the pictured action. This could be, for example, the visibility of the spatial courses of the movements, their temporal pattern and velocity, the somatosensory sensations involved, the goal of the action, and the importance of this goal for the acting person. The more information about an element a photograph contains, the more easily and quickly this element can be recognized (Bruzzo et al., 2008; Gu & Han, 2007; Perrett et al., 1991). The more visuospatial information a photograph provides that is meaningful in connection with the motor and somatosensory knowledge of viewers, the stronger motor and somatosensory activations the photograph evokes, and the less conceptual reasoning is required to understand the pictured action. The strength of motor and somatosensory reactions to photographs is presumably also influenced by the stage of the action being depicted. Photographs in which the final state of the action has not yet been reached evoke stronger activations than photographs in which the action is shown when it has been completed (Urgesi et al., 2010).
Emotional factors also play an important role. The more relevant the pictured action is to viewers, the stronger the emotional arousal is that it elicits (Bradley & Lang, 2007; Lang et al., 1997). The strength of emotional arousal in turn reinforces the motor and somatosensory processing of the pictured action (Barrett & Bar, 2009; Borgomaneri et al., 2012; Goldberg et al., 2014). Photographs elicit emotional arousal especially when the pictured action would require viewers to react with a motor behavior if they actually saw the action in the environment (Brosch et al., 2008). This may be the case, for example, when viewers see the two fighting boys in the middle panel of Figure 2. Concern that the boys are hurting each other could cause viewers to step in and separate the opponents to prevent injury. The photograph of the woman catching a ball shown in the top panel would not require a motor reaction from viewers and allow them to remain passive.
The strength of the emotional response to a photograph may also be influenced by the distance from which an action was photographed (Caggiano et al., 2009). The visuospatial information of a photograph provides clues for recognizing the position of the projection center that corresponds to the implicit position of the viewers in relation to the pictured individual and, thus, the spatial distance from which viewers see the individual (Cutting & Vishton, 1995). Pictorial information that evokes a representation of pictured individuals as being close to the viewers supports the association with action-related meanings (Caggiano et al., 2009). Individuals who are close in real life can immediately cause the individual pain or pleasure and are often associated with opportunities or requirements of action (Gibson, 1979; Kennedy et al., 2009). Individuals who are close may thus be perceived as particularly meaningful in terms of motor reactions. This circumstance is presumably also relevant in viewing photographs.
Do Photographs of Actions Have a Special Value in Certain Contexts?
Humans and animals like to observe motor behaviors of other humans and other animals (Lang et al., 1997; White, 1959). The earliest realistic images that people created more than 30,000 years ago show the fascination with characteristic motor behaviors of animals in situations that are related to the challenges of life (Guthrie, 2005). Pictures of what people do, how they do it, for what purpose, and with which results, also have a long history, which may have an evolutionary basis (Tooby & Cosmides, 2001).
Photographs of actions or somatosensory sensations evoke brain reactions in viewers that have a somatic component. Vicarious motor and somatosensory activations enable viewers to represent actions or sensations that they see in photographs in a way as if they were moving or having a sensation. Through the activities of the anterior insula, somatosensory cortices, OFC, and amygdala, viewers can, for example, feel the pain of the baby whose skin the doctor pricks with a needle, as can be seen in the bottom panel of Figure 2.
Viewers may be able to see actions or interactions in photographs associated with bodily experiences that the viewers have a need for, but cannot satisfy, like a need for pleasant touch or exercise. Viewing photographs of actions that the viewers would like to do themselves can stimulate the neural structures that would be involved in effectively performing the actions and having the experiences. Positive bodily experiences are fundamental to well-being (Van Boven & Gilovich, 2003). The importance of positive experiences in connection with actions and interactions is increasingly taken into account also in marketing (Pine II & Gilmore, 1998). This fact can be seen in photographs used in advertising campaigns, for example, in the current Coca-Cola campaign titled Taste the Feeling (Kislinger, 2017, 2018; Moye, 2016).
An action pictured in a photograph spanned a period of time when it was performed. The photographer chose a particular perspective and moment in which to take the picture. Compared with a representation of the action through a video clip, the photographic representation shows a higher degree of information selection, reduction, and organization. The photograph contains only a few of the many contingent visual features, which were perceptible in the situation in which the action was performed. When viewing a video clip, the viewer is confronted with a stream of information that is continuously renewed. The viewer only gradually learns the meanings of the action shown. In this regard, viewing a video clip resembles watching the real action. A photograph represents a more prefabricated meaning. Certain operations of selecting and organizing information no longer have to be done—or can be done—by viewers. A photograph, hence, suggests a higher degree of interpretation or explanation of the pictured action than a video clip. This characteristic can have value in contexts where it is important to receive or convey certain action-related meanings quickly and in a controlled manner. This can be as important in advertising as in cognitive science experiments in which researchers study reactions of subjects to certain stimuli in the laboratory.
Another important difference between photographs and video clips is that a photograph, as a static picture, gives the viewer any amount of time to process the implicit meanings in a sequence of fixations. Viewers can use the visual information in the photograph to imagine what is happening in the scene, what happened immediately before the pictured situation, and how the event continues. Vicarious motor and somatosensory activations correspond to an identification of viewers with the pictured acting person and are associated with taking her or his perspective (Ruby & Decety, 2001). Viewers can mentally represent the pictured scene in a way that they are physically present in it (Spreng & Grady, 2010; St. Jacques et al., 2011) and move through it (Persichetti & Dilks, 2016; Wang & Spelke, 2002).
Photographs of actions provide information about how certain people behaved in particular situations. Seeing a photograph of an action can add something to the viewer’s knowledge of which actions or behaviors they may encounter. Photographs of actions thus may support social learning processes. These learning processes are assumed to be based on both motor and conceptual processes and to have a rewarding component, which presumably relates to the activity of dopamine neurons (Caligiore et al., 2013; Friston et al., 2012). Recognizing and understanding complex conceptual meanings of stimuli based on processing in VVS seems also to be connected with reward-related activations, which in this case involve opioid-related neural activities (Lewis et al., 1981; for review, see Biederman & Vessel, 2006).
Since the 1960s, in the humanities and especially cultural studies, the notion has prevailed that photographs are a kind of words or texts (e.g., Barthes, 1961/1982; Hall, 1984). This notion reflects an influential trend in 20th-century science—the linguistic turn (Rorty, 1967). With it, the view became prevalent in the humanities that all problems related to recognizing and understanding meanings ultimately have to be understood as linguistic problems. With conceptual art, this notion conquered artistic photography (Kislinger, 2018; Salvesen, 2010). The notion that the cultural value of photographs primarily relates to the creation of concepts and conceptual knowledge has become a widely held notion. I consider this notion restrictive. Important features of the world are related to what humans and animals do. Certain properties of what people do can be better understood on the basis of motor and somatosensory knowledge than on the basis of conceptual knowledge. Photographs of actions support such an understanding. These facts have hardly ever been considered in discussions about the cultural value of photographs in the past half century. This seems strange because other people’s actions are among the most important stimuli in the external world. The special representation of action-related information through photographs could have made an important contribution to the ubiquity of photography.
Limitations
The referenced visual processing mechanisms are still a matter of great controversy. The conceptual integration of visual mechanisms with the meaning of photographs is necessarily speculative. This analysis focused on cognitive mechanisms by which action-related visual information is transformed into motor and somatosensory information. Social factors that may affect such transformations were not included in the analysis. I did not, for example, examine the question of whether and how viewer reactions to photographs of actions are influenced by the gender or age of the person acting, or by the fact that the person belongs to a group the members of which are perceived as likable or threating by the viewers (Molenberghs et al., 2013). This analysis also excluded the influence of contextual information contained in photographs on the meanings that viewers associate with pictured actions.
Questions for Future Research
Two basic predictions of the proposed framework are that a photograph of an action evokes motor and somatosensory neural activations in viewers as they would occur similarly (a) if the viewers really observed the corresponding action in the environment and (b) if the viewers would perform and experience the action themselves. These predictions can be tested by future experimental investigations. Mobile, wireless techniques like electroencephalography (EEG) headsets open up useful possibilities for investigations in “real-world” settings and could be combined with laboratory experiments. When examining the former prediction, adjustments to properties of the action or its depiction could provide important insights into the characteristics of photographs that represent action-related meanings. Possible variables include the distance from which the action was taken or is seen, the visibility of the somatosensory sensations involved, the presence of expressive behavior, or contextual information about the environment in which the action takes place.
A crucial question concerns the dimensions in which the characteristics of photographs that represent action-related meanings can be measured. Possible dimensions would be the strength and the time course of neural activations in EBA, MT, pSTS, IPL, vPMC, somatosensory cortex, and anterior insula with which viewers react to a photograph. A dimension for measuring the emotional component of action-related meaning could refer to the emotional arousal (Bradley & Lang, 2007; Lang et al., 1997) elicited by a photograph.
Photographs are static stimuli without dynamic changes. Their use in laboratory experiments allows easy control of their physical parameters (Bradley & Lang, 2007). This fact speaks for the use of photographs to test hypotheses about the automatic and involuntary nature of understanding observed actions. In general, little is known about the nature of the motor representations that are activated by photographs of actions: What do the motor representations stand for, into which the visual information from photographs of actions is transformed? Which elements of a pictured action are represented, for example, movements of body parts on certain spatial trajectories with certain time courses, involved somatosensory processes, or the goal of the action? Which properties of pictured actions influence the degree to which the actions are understood by viewers automatically and involuntarily? Findings on these questions would provide valuable information about the effects of looking at photographs of actions in everyday life. An important open question relates to the ability to evoke somatosensory activations in viewers through photographs of actions, in other words, to add a bodily component to the perception of the pictures. This question could be investigated by comparing brain reactions elicited by viewing photographs and video clips of actions.
The involvement of somatosensory neural activations in processing action-related meanings of photographs could also be examined through the modulation or disturbance of the neural reactions in somatosensory cortices, ACC, and the insula using transcranial direct current stimulation (tDCS) or transcranial magnetic stimulation (TMS; Keysers et al., 2018; Urgesi et al., 2014). It could be investigated how the disturbance of areas in VVS, DVS, or premotor cortex affects the ability of individuals to recognize motor or somatosensory aspects of pictured actions, such as the ability to predict the immediate further spatial course of a moving body part or the final state of an action, or to recognize its goal and correctly predict whether this goal will be achieved—whether, for example, the woman pictured in the top panel of Figure 2 will be able to catch the ball for which she stretches out her hands. It could also examine which brain structures and processes are involved in building a consciously perceived bodily experience of pictured movement, touch, or pain. Such experiments would provide important information in regard to the relationship between action-related and conceptual meanings of photographs. Finally, important knowledge could be gained by devising experiments that make it possible to investigate the potential social and cultural value of photographs representing action-related meanings. Appropriate experiments could be designed to test whether viewing photographs that represent action-related meanings evokes, for example, a higher degree of empathy (Jackson et al., 2005; Preston & De Waal, 2002) in subjects than viewing similar, but static photographs.
Conclusion
I developed a theoretical framework that describes the specific meanings represented by photographs of actions and the neural basis of these meanings. Due to activities of neurons that transform the visual information provided by a photograph into motor and somatosensory information, a neural representation of the pictured action is established in a viewer, which also would be activated if the viewer would actually perform the pictured action. This correspondence provides a special access to the bodily meanings of the pictured action. The proposed framework contains a number of predictions that can be tested by future empirical investigations. It provides a theoretical basis for future research on the photographic representation of actions and may, thus, contribute to the psychology of photography.
Footnotes
Acknowledgements
I thank Jenna Hicken for personal assistance in translating and editing the article. I also thank the anonymous reviewers for their valuable comments and suggestions.
Author’s Note
Some of the contents of this article were addressed in the author’s unpublished doctoral dissertation (Kislinger, 2018, see references).
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
