Abstract
After Russia’s war against Ukraine destroyed people’s ability to move and communicate freely in Ukraine, many Ukrainians turned to social media and messenger apps, especially Telegram, to produce and share information. The vast amount of this digital data is privatized, ephemeral, and difficult to utilize for research, raising urgent questions about its sustainable accessibility and usability. In this article, we explore a specific aspect of digital archive sustainability – the use of digital archives to preserve platform data related to Russia’s war against Ukraine – by focusing on data integrity, usability, and ethics. Our research is based on a case study of an interdisciplinary Data Sprint, “Russia’s War in Ukraine,” organized in collaboration with a Telegram Archive, in which academics and practitioners investigated qualitative approaches to studying a war on Telegram. In the article, we explore the possibilities and drawbacks of sustainable use of the Telegram Archive for qualitative approaches – semantic, visual, spatial, and link analysis – to working with large amounts of data. We argue that the sustainability of digital archives depends not only on their use, based on consistently stored and accessible data, but also the ethical aspects of their use for diverse research needs.
Introduction
The intense use of digital platforms and the production of data is one of the definitive features of today’s datafied media ecosystem (Van Dijck et al., 2018). Data is a commodity usually privatized by platforms, which base their business models around services related to data collection, sale, and utilization for advertising. In media and critical digital studies, datafication is often associated with quantification and the transformation of human activity into measurable units (Burkhardt et al., 2022). At the same time, datafication enables the emergence of critical data practices such as research, OSINT investigations, and digital archiving (Bareikytė and Skop, 2022). As a result, despite commodification, surveillance, and privacy risks (Zuboff, 2019), datafication provides researchers and activists with new paths to investigate mass violence.
These research practices utilize public platform data to document and investigate war experiences and events, and their impact on the lives of individuals. In the case of Russia’s war in Ukraine, on which we focus, platform data were used for studying interpretations of the ongoing violence (Makhortykh and Sydorova, 2017), tracking the spread of propaganda (Golovchenko et al., 2018), documenting human rights violations (Freeman, 2022) and informing civilian population about threats (Nazaruk, 2022). In some cases, data can be used to hold perpetrators of war crimes accountable (Goujard, 2022) or to plan military operations (Penninger, 2021). Such inquiries are part of wider debates in media studies that investigate the ethics and politics of contemporary wars, as well as the role of platform infrastructures and data in shaping (dis-)information, propaganda, and memory during wars (e.g. Ford and Hoskins, 2022; Mejias & Vokuev, 2017; Thylstrup, 2022).
Digital archiving of platform data has become part of the extended materiality of war (Agostinho et al., 2021). However the implementation of such a critical data practice is subject to many challenges, including technical and legal difficulties in retrieving platform data (Perriam et al., 2020), privacy concerns (Di Minin, 2021), the active role of platforms in governing and hindering such inquiries (Acker and Kreisberg, 2020; Banchik, 2021; Ben-David, 2020), or the very nature of archives as limited and exclusionary (Gilliland and Caswell, 2016). There are additional key challenges relating to the sustainability of digital platform archives, which is crucial for realizing the potential of critical data practices.
The concept of archival sustainability, which we discuss in more detail below, is particularly important at the time of war, when platform data is more susceptible than usual to deletion or manipulation. This particular sensitivity relates to the traumatic nature of the datafied events, for example, murder or torture; and to the life-threatening implications of accessing certain data, when subscription to certain platform communities can be treated as a sign of affiliation to a specific side and punished accordingly (Polikovska, 2023). Under these circumstances, platform users are more likely to delete data, which makes data practices less sustainable, and stresses the importance of digital archiving.
The concept of sustainability is complex and comprises different aspects. In our article, we scrutinize one specific aspect of archival sustainability – that is, the use of digital archives for critical data practices, with a focus on data integrity, usability, and ethics – regarding Telegram data related to the war in Ukraine. By digital archives, we mean organized collections of data from digital environments, including a wide range of social media, messengers, and websites. The Telegram Archive of the Center for Urban History (CUH) in Lviv, Ukraine, is an example of a digital archive that sustainably collects and preserves data. Due to its high sensitivity, the archive remains publicly inaccessible nearly 2 years after its inception. This time was taken to address the issue of sustainable use of the data, before making further decisions on its eventual availability. This situation brings us to our research question: How can we improve the usability of war-related data while reflecting upon (and trying to minimize) safety risks and ethical concerns connected to this data? While access to such archives is a crucial precondition for the use aspect of archival sustainability, our focus is not on the issue of access, nor on the archival embargoes (Gilliland and Caswell, 2016), but on the empirical exploration of the use of such archives for research and the possibilities and risks involved, under specific archival circumstances and specific positionalities of the authors. In other words, to be sustainable, the archive must (not only) exist and be maintained, but also be used. We addressed the research question through a multi-method data sprint in December 2022, applying methods ranging from qualitative content analysis and spatial analysis to visual discourse and link analysis to archived data.
Digital archives as a means of achieving sustainability of data about crises and wars
Various forms of platform data are collected by researchers and activists, with a focus on specific topics or research questions. Such collections are usually not part of long-term institutional preservation infrastructures (e.g. national archives or libraries). At the same time, archiving practices for the long-term preservation and integrative use of platform data within established heritage institutions are just beginning to emerge. According to the BESOCIAL project, aimed at developing a sustainable strategy for archiving social media in Belgium, policies vary widely between institutions experimenting with platform data preservation (Chambers et al., 2021). Only a few institutions are aware of the use of their platform collections for research purposes, and most of them provide limited, if any, online access to their archives (Chambers et al., 2021: 43–44).
Digital platforms vary in terms of the capacities they offer for collecting and archiving their data. In addition to this, there are methodological and ethical challenges to using web archiving for researching user interactions (Gilliland and Caswell, 2016; Lomborg, 2012). Platform data are corporate-owned and include both public and private aspects, which raises further ethical concerns for their archiving. Furthermore, especially since the closing of public APIs, many platforms are unarchivable by design (Ben-David, 2020: 251). The result is a complex practice of platform data archiving involving multiple and conflicting actors, including private companies, nonprofit organizations, researchers, and activists (Acker and Kreisberg, 2020). Under these circumstances, we inquire into the usage aspect of digital archival sustainability, which we broadly define as consistent data collection, storage, access, and usability.
In order to operationalize the concept of sustainability, we examine one specific aspect of archival sustainability: the use of digital archives as part of a sustainable digital practice (Bradley, 2007; Stuermer et al., 2017) in the contemporary attention economies of platforms (Davenport and Beck, 2002; Hoskins and O’Loughlin, 2015). We illustrate the aspects we have identified as key to the sustainable use of a Telegram archive, which we developed from our collaborative data sprint between scholars and practitioners: data integrity, usability, and ethics of use. These aspects enable us to explore the everyday data practices on Telegram of a wide range of civilians in war, and contribute to qualitative research on digital archives. Therefore, we argue that not only the ethical practice of data storage, but also the use of platform archives – the focus of this paper – is key to ensuring the sustainability of archived platform data.
The definition of sustainability in relation to digital archiving has evolved over time. Scholarly discussions of archival sustainability address multiple aspects, ranging from standardization of technology used for data preservation and usage (Eschenfelder et al., 2016) to diversified funding to the establishment of economically, societally, and organizationally viable arrangements (ANU, 2020; Bradley, 2008; Stuermer et al., 2017). However, there are many challenges to the sustainability of platform archives. These challenges include maintaining consistent funding, staff, and expertise under dynamic and adverse societal conditions, storing the dynamic digital-born content, and developing usable interfaces for accessing archival collections. Environmental challenges further complicate the process of making archives sustainable by requiring mitigation of environmental damage from archival facilities (Faulkner et al., 2021; Pendergrass, 2019; Varela, 2016).
The sustainability challenges are even harder to deal with in archives preserving commercial platform data. As a representative of the national Danish Web Archive puts it: “One of the main issues in archiving social media is its archivability” (Chambers et al., 2021:14), meaning the lack thereof, or, as Ben-David (2020) puts it – unarchivability by design. Archiving platform content is complicated by the relative paucity of tools, forcing archivists to use software designed for web archiving in general, or platform-specific harvesting methods, for example, collecting data in a look and feel format, as opposed to structured data export via APIs (Chambers et al., 2021: 38). In some cases, this challenge is being addressed through bilateral agreements through which platforms donate data to institutions, as in the case of Twitter’s collaboration with the Library of Congress (Zimmer, 2015). In the absence of such agreements, the process of archiving platform data is complicated by their ever-increasing volume and the possibility of them being deleted, or platform policy changing. It makes archiving highly dependent on the moment in which data were captured and the frequency of doing so (e.g. weekly and monthly). Attempting to mitigate this challenge, archives often focus on platform data associated with specific events and use national territories as a selection criterion, although platform data does not necessarily correlate with national borders (Chambers et al., 2021:15).
The sustainability of digital archiving is even more complicated during wartime. Archives related to mass violence serve different purposes: there are academia-centered (research), victim-centered (commemoration), and society-centered (preservation, education) archiving initiatives. These diverse motivations also affect their access and use strategies (Bultman et al., 2022). In addition, while projects focusing on cultural heritage preservation (e.g. BESOCIAL, UK Web Archive, Arquivo.pt, and Archive.org) have the time and resources to develop sustainability strategies, analyze existing policies, select data, and prepare technical solutions, those archiving data related to modern wars or crises encounter difficulties. There is no established workflow or software for archiving platform data, the data is sensitive, and there are no institutionalized collaboration models. Sustainability of digital archiving is a complex issue comprising many aspects, and facing challenges as outlined above. One of the less addressed aspects is the use of such archives for research, especially for qualitative research. At the same time, the practice of using responsive digital archives in itself makes it possible to identify and address the limitations and challenges of such usage in the future, once such archives are, hopefully, opened to a wider public.
Digitally archiving the war in Ukraine
Emergency archiving initiatives face many challenges ranging from ensuring data integrity to enabling ethical use of sensitive data. To address these challenges, initiatives develop various archiving methodologies, and engage in communication with platforms and communities that are archived. For instance, Documenting the Now evolved from the social media archive of the Ferguson protests in 2014 to an initiative which designs social media archiving policy models and tools (Jules et al., 2018). Likewise, the Syrian Archive evolved into an umbrella organization called Mnemonic, developing emergency archiving methodologies and advocating for better social media archiving policies (Kayyali, 2022). Such critical archival practices of war have expanded the notion of digital data archiving, from preservation to the maintenance, policy design, and use of such archives for investigations.
With the beginning of the full-scale Russian invasion, multiple archiving initiatives emerged in Ukraine. illustrating the scale of this work, a 2023 symposium “The Most Documented War” brought together 135 scholars and activists from Ukraine involved in archiving war-related data (Center for Urban History, 2023). The initiatives vary from governmental and civic projects documenting human rights violations, (e.g. Ukraine War Archive, Tribunal4Putin, and Ukraine 5 AM Coalition) to oral history and visual documentation initiatives, (24.02.22, 5 am: Testimonies from the War, The Reckoning Project, Dattalion, Ukrainian Warchive, and Wall Evidence) to projects dealing with country-wide destruction (Damaged.in.ua, or Destroyed Cultural Heritage of Ukraine) and location-specific destruction (e.g. Map of Siverodonetsk by Oleksiy Zadesentsev, and Map of Mariupol by Vitaliy Shtutman). Under these circumstances, the main challenge of archiving data on the Russian invasion concerns not necessarily data collection per se, but the identification of ways for the sustainable, long-term preservation, and ethical use of the collected data.
Despite the variety of data collection initiatives, relatively few explicitly focus on preserving war-related platform data. International efforts dealing with this task vary in scale and approach. For example, the Internet Archive has crawled the entire country-code top-level domain of the Ukrainian web together with war-related data from popular social media platforms in Ukraine and Russia, namely Telegram and VK.com. The Internet Archive also has provided infrastructure support to community archiving efforts (Holownia et al., 2022). Another initiative, the Ukrainian Archive, was launched by Mnemonic as a continuation of its efforts to document digital evidence of mass violence (Kayyali, 2022). Several national web archives and libraries (e.g. in the United Kingdom, Denmark, and Hungary) have started to collect platform data related to the war. In most cases, these are specific country-level domain websites with war-related content (Holownia et al., 2022). Finally, an important role in platform data collection is played by open-source investigation initiatives such as Bellingcat (2023), Texty, or Oryx, which collect data to analyze disinformation and propaganda strategies, monitor hostilities, or investigate specific war events.
In this article, we focus on one of the ongoing digital archiving initiatives in Ukraine – the Telegram Archive of the CUH. The Center develops and maintains a distributed human and technical infrastructure to ensure the sustainable archiving of Telegram data in the context of the war. Like other digital archiving initiatives in Ukraine, the Center’s project has had to deal with many challenges, both external, such as the ongoing Russian attacks against civilian infrastructure and resulting power outages as well as funding uncertainties, and internal, including the sensitivity of the archived war content and the diverse set of research practices relying on the use of archival data. What makes the CUH Telegram Archive exceptional, and an important case for illustrative and reflective use, is its curated nature and the fact that it is not yet open to the wider public. Under the conditions of ongoing war, its archivists balance the highly sensitive nature of collected data with the societal value of preserving these data to facilitate research, enforce justice, and sustain social memory practices (Bareikytė and Skop 2022; Freeman, 2022; Nazaruk, 2022). Opening up part of the Telegram Archive to our research allows us to explore different approaches to working with data, but also to examine the risks involved in such research.
Archiving and researching Telegram in Ukraine
Telegram is an instant messaging app founded in 2013 by brothers Pavel and Nikolai Durov. By now, Telegram has amassed more than 700 million monthly active users worldwide (Telegram, 2022). One of the reasons for Telegram’s popularity is its anonymization affordances that allow rerouting the traffic of the app through a VPN or an anonymization network (e.g. TOR). Telegram has enabled political fringe or extremist groups to recruit followers and facilitated the spread of misinformation (Rogers, 2020). At the same time, the platform has been used for the organization of protests and political movements and to circumvent censorship (e.g. Akbari and Gabdulhakov, 2019; Urman et al., 2021).
Telegram was already popular in Ukraine before the war (Khaund et al., 2021), but since the start of the 2022 Russian invasion, there has been a surge in its use. A survey (Opora, 2023) conducted in May–June 2023 shows that Telegram is the most popular social media platform in Ukraine, with 71,3% of respondents using it. At the same time, the survey indicated that social media platforms had almost the same level of public trust as television (i.e. 60%). These changes have resulted in Telegram becoming one of the main sources of information about war developments, and the key means of civilian and military communication. Its instant messaging functionalities, together with many participants’ channels and groups, are used for various self-organized data practices: from crowd-funding by civilian or military volunteers to OSINT investigations by journalists and activists, air-raid monitoring and evacuation route planning to psychological assistance and animal rescue (Nazaruk, 2022).
There is an increasing number of academic studies looking at the use of Telegram in Ukraine. Telegram data was used to study political influence and public relations (Plakhta, 2020), social behavior, including the behavior of Ukrainian parliamentarians (Khaund et al., 2021), and the spread of disinformation and participatory propaganda (Maschmeyer, 2021). In the context of the invasion, Telegram data was applied for researching strategic (Yuskiv et al., 2022) and everyday communication (Maathuis and Kerkhof, 2023). In addition to scholars, several media outlets, investigators, and activists have launched their own investigative projects based on data exported from Telegram, also focusing on Russian propaganda (e.g. Drozdova et al., 2022).
Besides anonymization, Telegram’s affordances allow data export, of both Telegram messages (e.g. attached images and videos) and metadata (e.g. sender’s name or time of publication). It is possible to export data via the desktop Telegram application or API in HTML, a presentation format accessible to human users, and JSON, a structured machine-readable data format. Telegram’s API is quite open compared to other platforms, with no extra costs. Many digital methods researchers are also using the platform by implementing data exports with existing tools such as 4CAT (Peeters and Hagen, 2022). For digital archives, this messenger is valuable as a basis for data collection or an addition to existing archives, especially when the platform is heavily used, such as in Ukraine during the war.
The data extraction functionality was used by the CUH to create a Telegram archive focusing primarily on public groups (many-to-many) and channels (one-to-many). Some cover communication in various localities across the country that are subject to higher risk of deletion, including chats from the towns near the frontline or occupied territories. Official and mainstream media channels are lower priorities for the archive, due to the low probability of their deletion. The platform data is exported in HTML and JSON, saved in .zip files with no compression, and stored on the institutional servers with an external server secured for a backup. The archive uses already existing server infrastructure developed for the Urban Media Archive of CUH, which provides institutional and infrastructural sustainability for long-term data preservation. The collected data is annotated in the registry with archival descriptions, keywords, and other technical notes about the archived channels and chats. All archival decisions and workflow are documented and will be described in the final project report along with its technical documentation. A hash sum for the final dataset may be assigned in the future. The issue of increasing its usage while minimizing the risk of misuse was addressed during the data sprint, and remains the priority in developing the future sustainable use of the archive.
The Telegram Archive of CUH is a non-profit project that emerged as an emergency archiving initiative to document and preserve communication practices on Telegram by curating and annotating selected chats, which is intended to increase their accessibility in the long term (see Nazaruk, 2022 for a detailed account of the creation of this archive).
Sustainability in terms of data integrity, usability, and ethics is essential for ensuring the integrity of Telegram research, in particular in the context of mass violence. The majority of existing studies on Telegram focus on case-specific snapshots of data, without scrutinizing the possibilities for making the preservation and use of this data sustainable. To address this gap, we discuss how we can make Telegram research in the context of the war in Ukraine more sustainable by using a digital archive. This discussion is based on our exploratory data sprint, organized in cooperation with the Telegram Archive from CUH, during which the archive was exceptionally opened for research.
The data sprint
We organized the 3-day online data sprint in December 2022 with the support of the CUH and the CRC Media of Cooperation at the University of Siegen. The data sprint was the result of the collaboration between researchers in Siegen, Frankfurt (Oder), and practitioners in Lviv and Kyiv. Although the CUH has archived Telegram channels since the beginning of the 2022 invasion, the collection has not yet been accessible to researchers because of the acute awareness of the potential risks involved. Access infrastructure and ethical access policies are currently being developed, including the structuring of the data according to levels of sensitivity, formulation of terms of use, and formulation of data protection and takedown policies. With the data sprint we aimed to scrutinize how the archive could be made sustainable from the point of view of use by inviting an interdisciplinary team of scholars and practitioners, including the archivists from the CUH and the Ukrainian Experimental Research Group with their TG Search initiative to work with the archived data, and collaboratively consider the opportunities and risks of different qualitative-empirical research approaches of working with data. This can also be seen as a first appraisal of the archive, which highlights potential risks and ethical considerations in the study of war data. TG Search tool served as a prototype to make the tool for the data sprint (https://tgsearch.com.ua). Due to the sensitivity of the topic, we decided to limit the data sprint to a small number of researchers.
A data sprint is an intensive, research-based workshop that brings together academic and non-academic participants to work on specific topics. Data sprints originated in the field of open-source software development, where hacking marathons brought together participants for intensive work on a problem prepared in advance (Venturini et al., 2018). In digital media studies, the Digital Methods Initiative is a notable example of an ongoing research initiative that holds data sprints on a regular basis (TDMI, 2021).
Our sprint consisted of two main research projects. One project examined a Telegram chat from occupied territories, focusing on people’s discussion of their war experiences and their decisions to flee, as well as the spatial dynamics of the chats. The project highlighted changes in chat dynamics and themes over time, as well as links shared in the chat. The second project examined political memes, with particular attention to visual content and its dynamics in selected chats.
Due to the sensitivity of the archived data – Ukraine is still under attack from Russia, and the information can be misused – and the methodological challenges of working qualitatively with large volumes of data, it was decided to prepare sample data for the data sprint participants to work with. The sample was provided by CUH. TG Search, part of an NGO called Ukrainian Experimental Research Group, created a web application with a graphical user interface for the data sprint to access the archived files easily and also to perform some analysis on them.
The data sprint digital search interface offered a range of filters (e.g. channel or user name, types of post, posts with hashtags, and URLs) chosen based on the manually prepared metadata from the Telegram Archive, and search functions to allocate the requested information and visualize it as a feed, spreadsheet, or chart. It was possible to export search results in a CSV format. As a pilot version of the tool, its use during the data sprint revealed certain limitations, including challenges regarding the data export limits. Despite its limitations, the data sprint search interface has enabled a group of researchers to work together remotely to use the Telegram Archive for qualitative analysis of wartime data.
Qualitative approaches for studying war on Telegram and archival sustainability
Qualitative text analysis
One of the approaches to using Telegram data for research is qualitative analysis, including thematic, content, or grounded-theory analysis. Texts and images taken from Telegram that convey information about the war can be analyzed as a corpus of empirical data by assigning semantic labels and giving structure to the complexity of the corpus (Braun and Clarke, 2021; Gläser and Laudel, 2009: 43–47).
Due to the large volume of data stored in the Telegram archive, a specific chat about one Ukrainian city occupied by Russia was chosen for the analysis. The chat has over 150,000 users and includes daily news, comments, videos, and images about the war. Using the TG Search tool, a dataset of approximately 2000 text messages from three selected days in March, April, and August of 2022 was downloaded and coded by two researchers. First, a grounded theory coding approach on a small data sample was adopted to define re-emerging categories and verify intercoder reliability. Although the coding results are not the focus of this paper, some general observations can be shared.
In March, immediately after the full-scale invasion, many discussions revolved around the veracity of information shared in the chat. Eventually, the admin pinned a message stating that they were not responsible for the reliability of the information. The tone of the conversations was matter-of-fact and largely devoid of emotions, with many images of destruction being shared without discussion. In April, however, the conversations became increasingly political, critical, and emotional, with terms such as “orcs,” “nazi russia,” and “rashist” appearing alongside informative content showing the ongoing destruction of the city. Overall, chat participants began to share more stories and discuss with each other. In August, discourse became even more political, with the emergence of debates about adaptation to the occupation conditions, including housing and prices, and ideological aspects of the war.
These observations show how qualitative text analysis facilitates the exploration of the emerging vernacular chat cultures and documentation practices concerning life in the war zone. From the integrity point of view, with the data archived, it is possible to study the development of chat discussions from a longitudinal and comparative perspective. In terms of usability, the analysis was limited by the archive format, which made it difficult to trace the chronology of chat conversations. Telegram allows users to reply to older chat messages, but in our dataset, it was difficult to trace which comments referred to which original statements. While this is a limitation, the confusion around the chronology of comments is consistent with the user experience when reading Telegram chats with multiple members.
Visual discourse analysis
Memes have taken an important role in online culture, moving from niche and fringe internet cultures into mainstream digital media, influencing political debates, and even being co-opted by marketers (Rogers and Giorgi, 2023). Since memes are “digital items with common characteristics that are imitated and reiterated around the web” (Nissenbaum and Shifman, 2017), most meme formats we encountered repeated themselves, for example, by using the same image with a different textual narration.
We focused on three channels in which memes were shared and analyzed the memes qualitatively as a discursive apparatus in an interdisciplinary group, allowing for the discovery and exchange of culturally contextualized meanings. We looked at the channels on the first day of the full-scale invasion and found that content mostly dealt with memes of general interest. The channels also provided news content and support messages offering empowerment to Ukraine. We then followed one channel over two time frames: from February 24 to March 4, the first 8 days of the war, and then April 15 – a day when the Ukrainian army damaged the flagship of Russia’s Black Sea fleet, which later sank.
We analyzed 47 memes that were shared in the first 8 days of the war and 21 memes from April 15 and noticed the following trends: On the first days, there were more memes on military and political topics, including local jokes focusing on specific politicians and cultural backgrounds. Memes were dedicated to events of the war, including Nestor Shufrich’s captivity and the sinking of the flagship. The bigger the event, the more memes were shared about it. A few memes mocked the decline of the Russian economy and the weakening of the Russian national currency. We saw a few examples of homophobic and racist memes referring to cultural stereotypes, such as rude and barbaric Russians and their supporters. The Ukrainians were portrayed as brave underdogs fighting fiercely against a stronger enemy.
One challenge was the ambiguity of memes when even participants with similar backgrounds had different interpretations of the same memes. Decoding required not only knowledge of the Ukrainian language but also familiarity with certain regional dynamics in the country.
Spatial analysis
Spatial analysis is a broad set of methods which examine spatial attributes of data (Goodchild and Longley, 1999), including the geographic coordinates at which data were produced. Greatly enhanced by the distribution of geographic information services (Fotheringham and Rogerson, 2013), spatial analysis has many applications in warfare research. Its uses vary from studying patterns in conflict diffusion (Raleigh et al., 2010) to investigating where war crimes were committed, to debunking disinformation claims such as the non-involvement of Russian regular troops in 2014 clashes in Eastern Ukraine (Forensic Architecture, 2019).
Examining the messages of one group chat in an occupied Ukrainian city revealed various information referring to offline spaces: names of streets, commercial venues, and other elements of local topography. Such messages often referred to the practice of pereklychka (a bottom-up social media practice of gathering and sharing information in order to better understand what is happening on the ground during war through visual and textual representations), discussed later. To identify space-related messages, we used a filter function “has image” in the TG Search, because the majority of posts with images contained photo evidence of shelling or screenshots referring to it.
The search results were transferred manually from the TG Search to a spreadsheet. Each message was labeled with regard to spatial information and structured according to date, text message, address(es) from a message, author, image URL, and its location in the archive. The structured data was uploaded to the ArcGIS Online mapping tool and geo-referenced. The geo-referenced layer was then added to the map to visualize evidence of shelling and hostilities, highlighting the degree of destruction in geographically dispersed areas of the city to showcase the scale of hostilities.
While spatial data from Telegram is incomplete, it can be corroborated with evidence collected by other documenting initiatives. Along with other sources, including official updates and military reports, it can be used to identify the spatial and chronological context for particular episodes of hostilities.
These data also provide insights into the practice of pereklychka that became common on Telegram during the intense urban fights in Ukraine. Unable to leave shelters and move through the city, civilians used the group chats to inform others about the situation in their area and learn about the situation in the other parts of the city. As the quality of internet connection deteriorated, contacting family and loved ones from other districts was often impossible. Pereklychka became one of the few ways to learn about the situation at a particular address. As an instant communication practice, it has an ephemeral and chaotic nature, with messages being regularly deleted because of being dangerous or containing outdated/irrelevant content. This ephemerality stresses the importance of archival preservation.
Link analysis
Hyperlinks are used to map connections between online spaces, which in turn can be used to map public issues (Marres, 2005). This section focuses on the “formal technique of crawling and analyzing hyperlinks” (Marres, 2015: 658) between Telegram chat groups and channels. Hyperlinks were crawled from a 3-day history of a channel in an occupied Ukrainian city. The focus was on the invite hyperlinks used to join a certain Telegram group or channel and posted as channel messages.
This approach to Telegram data collection adds a relational level between groups and their chat histories, enabling research focusing on networks between Telegram groups. For a more detailed investigation, a data set covering a longer period than the 3 days used for this methodologically-oriented analysis is needed.
Filtering for links in the initial Telegram channel was easy due to the built-in filter function of the TG Search. For the analysis of the found links, we used another set of scripts (Datenschutt, n.d.), since TG Search did not contain data regarding the group(s) mentioned in the channel. Although these scripts ensured anonymity, they required basic knowledge of Python and command line interfaces since they operate locally on the machine. With these tools and the Telegram Desktop Interface, it was possible to identify messages containing group links and download the chat histories of the found groups. While the implementation discussed here was manual, it can also be automated and scaled, as shown in Figure 1.

Data collection methodology (Khaund et al., 2021: 516).
Since the data sprint focused on one channel that was already accessible via the archive, the user authentication and data discovery steps are less relevant to our approach. Each of the following steps was used in the analysis, which was facilitated by the TG search tool and scripts. In addition to the processes shown above, hyperlinks had to be checked for relevance (i.e. the deletion of hyperlinks to other platforms and sites). The findings have been visualized below to show where group links were found, and whether they were found in the first or the second cycle of data crawling (Figure 2).

Made with LibreOffice Draw 7.3.7.2.
During the first cycle, three groups were found: FG, SG, and TG. Only one of the groups, SG, contained valid groups and channels. Their chat histories could be downloaded and analyzed for a third cycle. The chat histories of TG and FG did not contain valid links, while in TG, there were no Telegram links at all. In FG, the group links were all expired invite links. These findings raise questions about different forms of posting and debating behavior on Telegram: for instance, concerning the role of hyperlinks in war-related communication. It is important to consider that invite links also make visible groups that have been deleted or are inaccessible. Future research can therefore examine group chats that have been crawled and saved.
Discussion and conclusions
In this article, we examined the role of archival sustainability in enabling critical data practices dealing with the use of Telegram data to study the large-scale Russian invasion of Ukraine. Specifically, we reflected on the possibilities and challenges of using archived Telegram data for a broad range of qualitative research approaches based on a data sprint. For this aim, we used the curated Telegram Archive of the Center for Urban History in Lviv and looked at how qualitative approaches can be applied to studying the everyday semantic, visual, spatial, and networking practices in the context of the war.
Our observations suggest that sustainable use of digital archives in the context of war depends on the degree to which these archives ensure integrity, usability, and the ethical use of platform data. While data integrity is discussed in the context of digital archiving, the latter two aspects are currently under-theorized (Bradley, 2008). Yet, our observations suggest that accounting for them is critical in a data economy where scholarly and public attention is constantly shifting from one issue to another. To sustain such attention, it is important not only to preserve platform data that might otherwise be deleted and lost but also to enable reflection on this data and possibilities of using it for the purposes of research that respect users’ privacy and security. Disentangling the complex amalgamation of ethical issues, such as the preservation of user and regional privacy, the anonymization of user names or the names of places, points to the political, cultural, and affective nature of data practices in wartime.
The implementation of usability, accessibility, and ethical use of digital archives, however, is a non-trivial task. Our data sprint showed that different methodological approaches imply diverse requirements for platform data and pose different forms of ethical risk, which we outline below. Dealing with these challenges is complicated by the lack of a commonly accepted format of usage of platform archives agreed on by all digital archivists (e.g. due to different legal archiving frameworks, institutional, and platform policies). Furthermore, ethical risks become more prominent in the case of archives dealing with mass violence: for instance, during the data sprint, we were constantly unsure whether it was appropriate and safe to mention in our research the specific locations or names discussed in the channels (as well as the channel names). While our exploratory data sprint enabled us to identify the opportunities and risks of working with such archived data, much more practical research is needed to assess sustainable use practices in the future.
The diverse data requirements for different qualitative approaches for studying the Russian aggression against Ukraine have implications for the sustainability of archival data to be used for research purposes. Our analysis demonstrates that for all approaches, ensuring data integrity is the prerequisite for the sustainable use of archived data. Regardless of whether the research approach relies on qualitative content analysis or link analysis, in order to make data usable, the archive has to enable its preservation. The importance of data integrity is amplified by the high probability that some content will be deleted from the live version of Telegram. It is likely that some comments or links from the earlier days of the war have already been removed, in particular by chat members who were unable to leave the besieged city or had to undergo the process of filtration (e.g. Getmanova & Matviyenko, 2022). This data would therefore not be accessible in the chat at the present moment, without the Telegram archive. Automating the archiving process enables researchers to capture information before it is deleted, improving data integrity.
The importance and specific implementation of combined aspects of archival sustainability in the use of the archive, including ethics and usability, varies between methodological approaches. For instance, while ensuring that archival data is used ethically is essential to all approaches, its practical implementation varies due to different data requirements. For the visual analysis of Internet memes, privacy risks are less pronounced than in other approaches because memes are a public communication format that usually relies on the remediation of popular culture tropes and products. However, due to their virality, memes are susceptible to being used to spread hate speech and dehumanization, which may require mechanisms to prevent hateful memes from being accessed and redistributed through an archive.
In contrast, approaches that rely on qualitative content analysis, as well as geospatial analysis and link analysis, can potentially lead to privacy violations. The risk of such breaches is particularly pronounced in the case of war-related data, where the identification of social media and messenger users can have a direct impact on their personal safety. This concern stresses the importance of anonymizing archived content, with a special emphasis on filtering out personally identifiable information (e.g. regarding the identity of users from occupied territories).
Similarly, individual methodological requirements for data usability vary. For more processing-intensive approaches, such as those based on link analysis and geospatial data, it is important either to have the capacity to export raw data (which can then be processed locally to extract information about links or geographic locations), or to have built-in tools that can extract this data directly at the archive interface level. For qualitative content or visual analysis, such functionalities are less relevant. The key concern around usability for these approaches relates to the difficulty of tracking the chronology of user conversations and interactions. If such chronology cannot be reconstructed, it becomes difficult to analyze content using these approaches, since the meaning of some conversations might not be clear without understanding their place in a broader context of the conversation. This has the effect of limiting the usability of the data. Therefore, both preserving the message feed of the entire chat in the archive and providing tools and methods to filter it out from the rest of the messages are essential to enabling its accessibility to researchers.
As we argue, digital and social archive sustainability is defined, among other things, by its use, particularly for research. Amidst the extensive digital documentation of Russia’s war against Ukraine, and the diminishing public attention to this war (Želnienė, 2022), it is important not only to discuss the importance and ethics of digital archiving in general, but also to practice empirical ways of using such archived information for qualitative research, in order to make the archiving process sustainable in terms of its use. Therefore, developing ethically sensitive formats of usage of such archives contributes to its sustainability. Collaborative and interdisciplinary work in a data sprint proves to be a form of comprehensive access to sensitive and nuanced data, such as social media data about the war, which adheres to ethical and safety considerations.
In addition to reconsidering the notion of archival sustainability via usage, our observations have important implications for Telegram research. Telegram is usually studied as a platform for fringe actors promoting conspiracy theories and antagonistic content and sometimes posing security threats (Urman and Katz, 2022; Walther and McCoy, 2021). Much of this research claims these communities are drawn to Telegram due to its lack of content moderation policies, in comparison with other mainstream platforms’ strengthening moderation policies (Willaert et al., 2022). Our exploration of Telegram shows how the same platform affordances can be used by Ukrainian users to share their daily practices and needs during the war. It highlights that despite risks, less moderated platforms can also help sustain constructive connective action of communities in situations of duress, war, and danger.
Footnotes
Acknowledgements
We are a group of researchers with different national backgrounds, professional, and lived experiences. Among us, there is a Ukrainian digital archivist based in Lviv, a Ukrainian communication and media scholar living in Bern, a Lithuanian digital media scholar living in Berlin, an Israeli media scholar with ancestral roots in Eastern Europe, including Ukraine, based in Amsterdam, and a German media scholar living in Cologne. This variety of backgrounds means we have different attachments and connections to this topic, and we bring multiple points of view to the study of war data. Some of us experience this war on the ground, some of us through the experiences of relatives, some have experienced a state of war in another country, and some have never been subjected to war.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Gefördert durch die Deutsche Forschungsgemeinschaft (DFG) – Projektnummer 262513311 – SFB 1187 (Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Project-ID 262513311 – SFB 1187).
