Can social media data be freely used? Participants’ ethical perceptions toward using their social media data in research

Abstract

French

In the big data environment, various systems and platforms have provided billions of data points to researchers. The large amount of user data on social media platforms has become a source for research data for many kinds of research. However, few scholars realize the ethical risks in the collection and utilization of social media data, and many ignore the ethical needs of users themselves. Users’ concerns should be considered when formulating ethical guidelines. This study takes Sina Microblog (the world’s largest Chinese social media platform) users as the research subject, hoping to provide data from Chinese users and provide evidence for differences in users’ ethical perceptions in different cultural contexts. Within our survey sample, few users had previously known that their microblogs could be collected and used by researchers, and the majority believed that researchers should not use their microblogs without consent. We also found differences in cognition regarding ethical issues in social media data research across groups.

Keywords

ethics Internet research perception social media data

Introduction

The development of social media has not only changed online communication but also profoundly changed academic research. Researchers have developed technological skills to construct a variety of online methods and sites to explore experience and behavior in the virtual world (James and Busher, 2015). Considerable changes have also taken place in data collection, sampling, analysis, and the entire research process. Scholars were exhilarated by the prospect of tapping into the vast troves of personal data collected by Facebook, Google, and a host of start-ups, which they said could transform social science research (Vindu, 2014). The new quantitative and statistical analysis enabled by network data research has changed traditional social research and has now become one of the most popular forms of social science research (Koene et al., 2015). Social media users, who are the source of these data, may never even know that they are objects of study, let alone explicitly consent. The unique advantages of network data research can easily cause researchers to ignore ethical issues associated with the collection, description, analysis, dissemination, and even use of data in the pursuit of convenience and economy (Floridi and Taddeo, 2016). Such research has also caused serious controversy. However, as Vitak et al. (2016) point out, there are no agreed-upon norms or best practices in this space. Although organizations such as the Association of Internet Researchers (AoIR) have offered guidance for researchers (Ess and AoIR Ethics Working Committee, 2002; Franzke et al., 2019; Markham and Buchanan, 2012), these guidelines rarely acknowledge the voices of the ‘participants’ (Fiesler and Proferes, 2018). Participants are the people whose data are being collected and studied. Each social media user may have participated in social media data research. However, these individuals have their own attitudes and opinions about how researchers should study and use social media data (Zimmer, 2010a). Research ethics must balance risk and interests in guaranteeing the rights of individual participants and should be grounded in ‘the sensitivities of those being studied’ and ‘everyday practice’ as opposed to bureaucratic or legal concerns (Brown et al., 2016). Therefore, the views of social media users deserve attention.

The research of Fiesler and Proferes (2018) has made a useful attempt to explore this problem from the perspective of Twitter users. However, few researchers have focused on scholars’ use of Chinese social media data or analyzed the ethical perceptions of Chinese social media users about using their social media data in research. The Sina microblogging platform is the most influential Chinese social media platform in the world. As of 31 March 2021, monthly active users of microblogs reached 530 million, and daily active users of microblogs reached 230 million. This study draws on the analysis of Twitter users by Fiesler and Proferes (2018), adopting social media users as its research object and taking Sina Microblog, the world’s largest Chinese social media platform, as an example. It uses the perspective of social media data research participants to investigate the perceptions of Chinese social media users on the collection of their own social platform data for research. We hope to provide data from Chinese users and evidence for differences in users’ ethical perceptions in different cultural contexts. At the same time, this study deepens the research problems; analyzes the cognitive differences at different ages, genders and education levels; and tries to analyze the research results from the perspective of Chinese social development and culture. The hope is to remind researchers of the necessity of ethical reflection when conducting such research and provide a reference for researchers and regulatory bodies for ethical decision making.

Literature review

The development of the Internet has introduced a series of complex issues for research ethics regarding informed consent (Barocas and Nissenbaum, 2014), terms of service (Fiesler et al., 2016; Vaccaro et al., 2015), the relationship between researchers and participants (Beaulieu and Estalella, 2012), and the definition of public space and public relations (Ravn et al., 2020; Zimmer, 2010a). Lease et al. (2013) believed that even if the data were anonymous, they might be reidentified. Swirsky et al. (2014) stated that the use of social media data may present unexpected ethical dilemmas, especially when we use social media in medical research; we must be careful to ensure that we are on the right ethical track. Zimmer and Proferes (2014) stated that these ethical considerations spark debate but remain largely unresolved. Moreno et al. (2013) expressed that social media websites provide researchers with new research opportunities and present new challenges. The author believes that confidentiality is key to social media research, and it is important to protect the identity of participants.

At present, there are several studies involving the ethics needs of ‘participants’. Williams et al. (2017) found that users are most comfortable when asked for consent and when tweets are anonymized. Mikal et al. (2016), when using Twitter data to monitor depression, found that users were far more comfortable with aggregate-level health monitoring than individual assessments. There are also studies showing that many social media users have a limited understanding of how their data are used. Proferes (2017) found that users generally did not know of the existence of Application Programming Interface (API), and some users did not even realize that their tweets were completely visible. Fiesler and Proferes (2018) found that few users knew that their public tweets could be used by researchers, and most users believed that researchers should not use their tweets without their consent. Although researchers generally view Twitter as public, users making their tweets visible to the public does not mean that they consent to them being collected and analyzed (Zimmer, 2010b). In addition, the privacy expectations of users differ depending on the length of time that they participate on the platform (Martin and Shilton, 2015).

Nowadays, Chinese scholars have also carried out many theoretical and practical studies related to social media, but most focus only on the value of social media data as an academic resource, such as to understand users’ behaviors and emotions (Wang et al., 2013) or for event tracking and prediction (Wu and Zhang, 2017; Xu, 2013) or disease prevention (Yang et al., 2014). Few scholars are aware of the ethical issues that may exist in activities such as collecting, using, describing, analyzing, disseminating, and opening these data. A literature review indicated that there is much research on social media data research ethics but that such studies focus mainly on Twitter and Facebook. No scholars have explored the ethical issues of Chinese social media data research, let alone considered the ethical needs of social media users. Therefore, this article selects the Sina microblogging platform to investigate the perceptions and differences in the cognition of Sina Microblog users with respect to the collection and use of their data.

Data acquisition and analysis

Referring to the ethical problems in network data research identified in previous studies (Fiesler and Proferes, 2018; Shilton and Sayles, 2016; Vitak et al., 2016) and considering the actual situation in China, we formulated a questionnaire to collect information on demographic characteristics, microblog use, and 14 different contextual factors (EP1–EP14). An open-ended question about how users perceived microblog data research was included. The questionnaire used a 5-point Likert-type scale, with ‘5’ indicating very uncomfortable and ‘1’ indicating very comfortable; the smaller the number, the lower the perception level.

The research team posted questionnaires on Sojump, a Chinese online survey platform. Based on the feedback of the presurvey respondents, some items and unclear concepts in the questionnaire were modified to ensure that the respondents could understand the questionnaire items well during the formal survey. Then, the finalized questionnaires were posted from 23 August to 30 September 2019. To ensure the representativeness of the sample, microblog users of different ages, genders, educational levels, and backgrounds were invited to complete the questionnaire to obtain as many responses as possible. The survey lasted 6 weeks, and 385 questionnaires were collected, including 320 valid questionnaires. Using the Analyze-Scale-Reliability Analysis function in SPSS 16.0, the Cronbach’s α coefficients were calculated for the variables observed. The overall Cronbach’s α of the questionnaire was 0.919, which indicates good credibility and stability. The questions in the questionnaire were adapted from previous mature research and modified by experts and had high content validity. The demographic details of the participants who returned valid surveys are tabulated as follows (Table 1).

Table 1.

Demographic information of the respondents.

		Frequency	%			Frequency	%
Gender	Male	130	40.6%	Education level	Junior/senior high school	30	9.4%
Gender	Female	190	59.4%		College	16	5%
Age	⩽20	67	20.9%		Undergraduate degree	138	43.1%
	21–30	193	60.3%		Master’s degree	96	30%
	31–45	39	12.2%		Doctoral degree	40	12.5%
	⩾46	21	6.6%		Doctoral degree	40	12.5%

The respondents were distributed in 21 provinces and 47 cities in China. Of the 320 respondents who returned valid surveys, 75% had registered on Sina Microblog before 2016, and 76% had one microblog account. When asked how often they accessed microblogs, 56% indicated ‘every day’ and 65% ‘more than 5 times a day’. Sina Microblog provides users with different privacy setting options, such as ‘Who can comment on my microblog’, ‘Microblog visible time range’, ‘Do not allow my microblog to show in the same city’, and ‘From whom I’ll receive messages’. In total, 85 respondents limited the ‘Microblog visible time range’, and 207 respondents had no privacy restrictions on their accounts.

In terms of data analysis, this study primarily used descriptive statistical analysis methods and conducted exploratory analysis of differences in cognition across user groups. The arithmetic average method was used to calculate the average of the perception levels among users with different characteristics. In addition, we summarized and open-coded the responses to the open-ended question and used them to supplement and explain the statistical results.

Findings

Familiarity with using microblogs in research

In the survey, we asked participants about their familiarity with microblog data research and related information, as shown in Table 2.

Table 2.

Familiarity with microblog platform and microblog data research.

Question	Very unfamiliar	Somewhat unfamiliar	Neither unfamiliar nor familiar	Somewhat familiar	Very familiar
Your microblog may be used for research	5.9%	29.7%	29.7%	30.3%	4.4%
The interface and function of API	22.8%	35.0%	22.8%	16.6%	2.8%
Restrictions on using your microblog	11.9%	37.2%	31.3%	18.1%	1.6%
Microblogs without special restrictions set are completely open and visible	4.4%	16.9%	28.4%	40.9%	9.4%

API: Application Programming Interface.

According to the survey, only 30.3% selected ‘somewhat familiar’, and only 4.4% selected ‘very familiar’. This result indicated that a large number of users did not know that they had become the object of scientific research and that their microblogs had been used for research.

When asked ‘Do you think that researchers should be able to use microblogs in research without users’ permission?’, a vast majority (81.3%) indicated they should not. Forty respondents thought it was ‘forbidden by Sina Microblog’, 183 thought it was ‘copyright infringement’, 140 thought it was ‘contrary to research ethics’, and 33 stated that they did not want their personal privacy invaded. These responses are quite different from those of Twitter users. In Fiesler and Proferes’s (2018) survey, 42.7% of the respondents believed that Twitter data could not be used, and the corresponding percentage of Chinese participants was almost twice that. The research team followed up with the respondents. No user truly knew whether Sina Microblog prohibited data use or whether there were relevant restrictions in copyright law. Some respondents said, ‘I do not know the specific provisions of copyright law, but I think there should be such restrictions’. Clearly, much of Chinese users’ negative response to this issue comes from their resistance to the collection of microblog data, while their knowledge of the cause of this problem is very vague.

Attitudes about using users’ microblogs in research

We further investigated users’ attitudes toward microblog research and asked contextual questions to explore them. As shown in Table 3, the majority of respondents were somewhat comfortable or ambivalent about microblogs being used in research. However, participant responses shifted to much higher levels of discomfort when ‘your entire microblog history’ became the object of research.

Table 3.

Comfort with microblogs being used in research.

	Question	Very uncomfortable	Somewhat uncomfortable	Neither uncomfortable nor comfortable	Somewhat comfortable	Very comfortable
EP1	How do you feel about microblogs being used in research?	3.4%	7.2%	35.0%	41.3%	13.1%
EP2	How would you feel if some microblogs of yours were used in research?	3.8%	13.1%	28.1%	42.8%	12.2%
EP3	How would you feel if your entire microblog history were used in research?	11.6%	28.4%	27.5%	24.1%	8.4%

Participants’ acceptance of microblog data research varies according to specific situational factors. As shown in Table 4, the percentage of respondents who were very uncomfortable about ‘you were not informed at all’ reached 42.8%, while it dropped to 15.3% for ‘you were informed after the fact’. Participants were less uncomfortable when the data were ‘analyzed by a computer program’ than when they were read by humans, and the percentage of ‘very uncomfortable’ and ‘somewhat uncomfortable’ responses for the latter were approximately three times higher. For the two questions ‘attributed to your microblog handle’ and ‘attributed anonymously’, the respondents obviously preferred anonymity and did not want to be identified. In general, the respondents showed obvious resistance to the three situations described in EP4, EP8, and EP12, and more than 40% chose ‘very uncomfortable’ for each of them. The obvious difference in participants’ attitudes toward two similar problems in different situations fully indicates that microblog users attach great importance to the issues of ‘respect’, ‘privacy’, ‘anonymity’, and being ‘informed’.

Table 4.

‘How would you feel if your microblogs were used in research and . . . ’.

	Question	Very uncomfortable	Somewhat uncomfortable	Neither uncomfortable nor comfortable	Somewhat comfortable	Very comfortable
EP4	. . . you were not informed at all?	42.8%	28.8%	14.4%	10.6%	3.4%
EP5	. . . you were informed after the fact?	15.3%	33.1%	27.5%	19.4%	4.7%
EP6	. . . they were analyzed along with only a few dozen microblogs?	9.7%	19.7%	33.4%	30.9%	6.3%
EP7	. . . they were analyzed along with millions of other microblogs?	9.5%	20.2%	40.7%	20.2%	9.5%
EP8	. . . the data came from the part you don’t want visible to strangers?	44.1%	29.4%	14.7%	8.8%	3.1%
EP9	. . . you had later deleted them?	12.2%	27.2%	34.1%	22.2%	4.4%
EP10	. . . no human researchers read them, but they were analyzed by a computer program?	5.3%	11.9%	36.9%	38.1%	7.8%
EP11	. . . the human researchers read your microblogs to analyze them?	16.6%	31.3%	30.0%	17.2%	5.0%
EP12	. . . the researchers analyzed your public profile information, such as your location and username?	42.5%	18.8%	21.3%	13.4%	4.1%
EP13	. . . your microblogs were quoted in a published research paper and attributed to your microblog handle?	21.9%	23.8%	27.9%	19.7%	6.9%
EP14	. . . your microblogs were quoted in a published research paper and attributed anonymously?	8.1%	18.1%	30.3%	34.1%	9.4%

When asked how they would respond if a researcher contacted them and asked for permission to use some of their microblogs as part of a research study, 34.1% indicated that they would give permission, only 7.8% indicated that they would refuse outright, and 58.1% indicated that it would depend on some contextual factor. These responses are significantly lower than those in a survey of Twitter users, in which the percentages of ‘agree’ and ‘disagree’ were 53.4% and 13.8% (Fiesler and Proferes, 2018), respectively. Clearly, Chinese microblog users are more cautious about their social media data being collected and used or their attitude on this issue is still vague.

Differences in ethical perceptions

In the big data environment, everyone plays multiple roles in the data chain at the same time. At different stages, each subject has different data needs and different perceptions of ethical issues. Previous studies have considered gender, age, education level, and online experience when investigating users’ privacy concerns (Malhotra et al., 2004; Sheehan, 2002). We refer to these indicators and consider the specific context of the study to further explore the differences in participants’ perceptions of microblog research ethics attributable to gender, age, education level, and microblog usage frequency.

Age

On the whole, the highest overall perceptions of the collection and utilization of microblog data were held by young adults aged 21–30, among whom the average values of EP4 and EP8 were both above 4. Respondents in this age group said, ‘Personal privacy should be protected’, ‘Privacy protection should be given priority’, and ‘I hope that under the premise of protecting privacy . . . ’ (Figure 1 and Table 5).

Figure 1.

Perception level by age.

Table 5.

Perception level by age.

	⩽20 years	21–30 years	31–45 years	⩾46 years
EP1	2.49	2.49	2.46	2.19
EP2	2.51	2.56	2.54	2.38
EP3	2.94	3.26	2.77	2.86
EP4	3.9	4.13	3.62	3.33
EP5	3.19	3.44	3.28	3.19
EP6	2.93	3.02	2.82	2.71
EP7	2.9	2.77	2.77	2.48
EP8	3.67	4.21	3.97	3.71
EP9	3.03	3.32	3.05	3.05
EP10	2.72	2.72	2.54	2.57
EP11	3.13	3.54	3.21	2.86
EP12	3.63	3.96	3.54	3.71
EP13	3.15	3.46	3.15	3.19
EP14	2.84	2.85	2.64	2.76

The perception levels of the adolescents under 20 years old and the middle-aged respondents between 31 and 45 years old were generally similar. Indeed, although the perception level for EP4 of the respondents under 20 years old was significantly higher than that of the respondents between 31 and 45 years old, the perception levels for the other items were relatively similar. The participants under 20 years old gave greater weight to ‘inform in advance’ and ‘respect’. Many respondents under 20 years old wrote, ‘it’s disgusting to collect my data without permission’, ‘please ask for consent’, and ‘researchers should explain in advance’. Respondents over 46 years old had the lowest perception levels. This group indicated that they rarely publish personal news and that they mainly engage in browsing and forwarding. Some respondents in this age group wrote, ‘in the era of big data, it’s normal for microblog data published publicly to be used, and the vast majority of people will understand this’ and ‘this is the new development trend, which is worth encouraging’.

A previous study found that age had a significant negative relationship with the breadth and depth of disclosure; that is, younger people are more willing to disclose personal information (Li et al., 2015). However, this does not mean that young people do not care about privacy. In fact, they may be more concerned about their data being collected and analyzed, invading their personal privacy. Another study found that the online privacy behaviors of adolescents and young people are actually stricter than those of the elderly (Park, 2013). However, some studies reported the opposite pattern. Bryce and Klang (2009) indicated that young people did not pay much attention to their online privacy, did not have enough knowledge about the complexity of technology and data mining, and lacked sufficient understanding of legal protection. Some respondents in the 31–45 age group wrote, ‘the method of obtaining data should be reasonable and legal, the data should be used safely, and the user’s right to know and privacy should be guaranteed’ and ‘the data can be used under authorization, but the researchers must ask for permission again if the studies involve personal privacy, and they must ensure that the data cannot be leaked’. This age group includes millennials, who grew up with the development of the Internet. Their rich experience with the Internet makes their concerns more sophisticated.

Therefore, there seems to be a complex relationship between age and online privacy management. We find that adolescents and young adults are more concerned about their own microblog data than middle-aged and older people, but their ethical needs are different. Among the younger respondents, the group with the highest frequency of microblog usage (21–30 years old) showed more concern about the disclosure of personal information, while those under 20 years old focused most on control of their own data and the principle of respect.

Gender

Among the respondents, females’ perception of microblog research ethics was significantly higher than that of men. The average perception level of female respondents for items EP4, EP8, and EP12 exceeded 4. Many previous studies have shown that female users are more likely than male users to reveal their personal information on social networks (Hoy and Milne, 2010; Litt, 2013). However, another study showed that females care more about the impact of external behavior on personal privacy (Sheehan, 1999), and the privacy setting levels of females are significantly higher than those of males (Special and Li-Barber, 2012). Females want to have more control over their personal information. Some female respondents wrote, ‘respect privacy first, then conduct research’, ‘please ask the users’ consent, and the data should be used in proper ways’, and ‘hope to be asked for permission for each microblog use’. However, the perception levels of the male respondents were markedly lower than those of the female participants. Some male respondents wrote, ‘no matter what attitude we take, we cannot change the situation of microblog data disclosure and use in academic research’ and ‘a microblog itself is public information, so privacy doesn’t matter’ (Figure 2 and Table 6).

Figure 2.

Perception level by gender.

Table 6.

Perception level by gender.

	Male	Female		Male	Female
EP1	2.47	2.45	EP8	4.41	3.46
EP2	2.59	2.45	EP9	3.39	2.99
EP3	3.28	2.85	EP10	2.77	2.56
EP4	4.29	3.5	EP11	3.58	3.07
EP5	3.55	3.06	EP12	4.12	3.38
EP6	3.1	2.75	EP13	3.65	2.89
EP7	2.86	2.65	EP14	2.89	2.7

Education level

The degree of privacy concerns is related to the degree of education (May et al., 2007). Milne and Rohm (2000) also indicated that the higher the education level of users, the less they want the website to record their personal information (Milne and Rohm, 2000). As shown in Figure 3, the perception levels of the respondents educated at the junior or senior high school level were lower than those of the other groups. The respondents with master’s degrees or doctoral degrees showed a higher perception level, with similar overall scores. However, the average score for each item had the largest range in the doctoral group, and these users’ perceptions of EP2, EP7, and EP14 were relatively low. The doctoral group showed higher levels of understanding and acceptance regarding collecting ‘some microblogs’ rather than the ‘entire microblog history’, processing data with ‘millions of other microblogs’ and ‘anonymous attribution’. However, there were also three items with an average value of more than 4 in the doctoral group, with EP8 having the highest value of 4.18. Generally, the highly educated respondents showed an understanding of microblog research, but they also showed a clear and strong aversion to being ‘uninformed’ and ‘invasion of privacy’ (Table 7).

Figure 3.

Perception level by education level.

Table 7.

Perception level by education level.

	Junior/senior high school	College	Undergraduate degree	Master’s degree	Doctoral degree
EP1	2.43	2.56	2.41	2.53	2.48
EP2	2.57	2.63	2.51	2.60	2.4
EP3	2.77	3.00	3.09	3.23	3.18
EP4	3.57	3.94	4.07	3.92	4.08
EP5	3.17	3.19	3.37	3.36	3.45
EP6	3.00	2.88	2.99	2.98	2.80
EP7	3.10	2.88	2.80	2.73	2.55
EP8	3.33	3.88	4.04	4.18	4.18
EP9	2.87	3.63	3.15	3.22	3.45
EP10	2.80	2.94	2.63	2.69	2.70
EP11	3.07	3.19	3.31	3.44	3.73
EP12	3.40	3.75	3.76	3.95	4.08
EP13	2.93	3.00	3.24	3.50	3.75
EP14	2.83	3.00	2.79	2.94	2.52

Microblog usage frequency

As indicated by Figure 4, the perception levels of people who viewed microblogs more than once a day were significantly higher than those of other groups. Users who use microblogs frequently transmit or acquire relatively large amounts of data through microblogs, have more experience using microblogs, and have a deeper understanding of potential risks. People with more experience on the Internet will pay more attention to personal privacy (Phelps et al., 2000). Some respondents who visited microblogs many times a day wrote, ‘data disclosure does not mean willingness to be analyzed’ and ‘it is better to indicate the specific research direction, and maybe I will change my acceptance of this issue according to the specific research aim’. Users with rich microblog user experience have a better understanding of their own microblog data and ethical needs and can therefore express their attitudes rationally (Table 8).

Figure 4.

Perception level by microblog usage frequency.

Table 8.

Perception level by microblog usage frequency.

	More than 5 times a day	2–4 times per day	Once a day	Once every few days
EP1	2.68	2.58	2.38	2.32
EP2	2.82	2.63	2.35	2.39
EP3	3.28	3.41	3.06	2.86
EP4	4.15	4.23	3.97	3.73
EP5	3.42	3.52	3.50	3.19
EP6	3.14	3.09	2.82	2.83
EP7	2.75	2.80	2.79	2.77
EP8	4.17	4.25	4.06	3.82
EP9	3.46	3.35	2.97	3.06
EP10	2.83	2.68	2.59	2.65
EP11	3.65	3.59	3.15	3.17
EP12	4.02	4.00	3.59	3.69
EP13	3.57	3.63	3.12	3.12
EP14	2.91	3.02	2.62	2.7

Discussion

Researchers cannot assume user consent

Our survey shows that more than 71% of the respondents felt uncomfortable or very uncomfortable with the collection of social media data without being informed by researchers. Social media users have strong ethical needs, and most become research participants without knowing it. From the user’s point of view, there is an important ‘psychological difference’ between users expressing their views on Internet platforms and in newspapers or public meetings, which should not always be assumed to seek ‘public visibility’. Some users think that some dialogs, comments, questions, and answers are only between private individuals (Bruckman, 2002). Very few Internet users read the privacy policy of the website, and they rarely have a complete understanding of it (Al Zou’bi et al., 2020; Fiesler and Proferes, 2018; Zimmer, 2015). In this survey, only 19.4% were ‘somewhat familiar’ or ‘very familiar’ with the interface and function of the API, and only 19.7% were ‘somewhat familiar’ or ‘very familiar’ with restrictions on microblog use. Therefore, just because users’ personal information is made available in some fashion on a social network does not mean that researchers are allowed to systematically follow, harvest, archive, and mine it (Zimmer, 2010b).

Analyzing and mining social media data are undoubtedly of great benefit to research, and in fact, this practice is not prohibited by law in China and many countries because the data are publicly available. However, this phenomenon has triggered many ethical discussions. This dispute is also related to the debate on ‘privacy in public’ in the last 20 years. Many people believe that social media is a public space, and there is no difference between what people publish on social media and in newspapers (Omand et al., 2014). However, the boundary between public and private space has changed in the digital age. Social media have a dual nature: they are public but often give people a sense of privacy. Therefore, the information from social media is located in the ‘gray area’ between public and private (Ronn and Soe, 2019). In fact, as early as 1997, Nissenbaum (1997) proposed that the protection of privacy should not ignore the noninitiate realm. Other studies have confirmed with empirical data that social media users have strong ethical needs, and ethical attitudes will change with changes in researchers’ behavior (Chen et al., 2021a). Therefore, the random collection of personal information on public platforms may infringe on users’ privacy, lead to a chilling effect and inhibit free expression. Researchers cannot take for granted that social media data belong to the public space and assume that users agree; thus, respect is important. The Belmont Report proposes three ethical principles for the use of human subjects in research, and the first is ‘respect for persons’. A participant in our research wrote, ‘As long as you inform me in advance, I will generally agree. Confidentiality and respect are important!’ Informed consent involves both informing and consenting, and for many respondents, the former is sufficient (Fiesler and Proferes, 2018).

In addition to ‘respect’, microblog research should also adhere to the principles of ‘beneficence’ and ‘justice’. Research should minimize potential harm and maximize benefits. Although some users wrote that ‘if it is helpful for academic research, it is feasible to share microblogs’, some users had different opinions: ‘I don’t want my microblogs to be used without my permission. Although scientific research is helpful to the public, it’s the researchers, not the microblog users, who directly gain fame and fortune’. Similarly, some Twitter users wrote, ‘they are taking property and using it without permission, I just don’t think that is right’ and ‘only if they were offering me some kind of compensation. It is not fair to profit from MY ideas and offer me nothing’ (Fiesler and Proferes, 2018).

Different groups have different ethical needs

There are obvious differences in cognition among the different groups, and the focus of ethical needs differs as well. Adolescents and young adults are more concerned about the collection of their microblogs than middle-aged and older people, and adolescents are more concerned about ‘respect’. Young adults aged 21–30 are more concerned about ‘privacy disclosure’, and the middle-aged group is more concerned about ‘authorization’, ‘secondary authorization’, and other professional issues. This study also found that females and people who use microblogs more frequently show stronger ethical needs. Participants with a high education level have more explicit ethical needs but also show an understanding of the normative use of data. In different cultural environments, participants’ perceptions are ethical issues in social media data research are also significantly different. In the study by Fiesler and Proferes (2018), 92.9% of the respondents were from the United States, while the subjects of this study were all from China. Compared with Twitter users, Sina Microblog users in China have a relatively vague and uncritical understanding of microblog research. Only approximately one-third of Sina Microblog users directly agreed that their microblogs could be used. This outcome is closely related to the differences in the cultural, policy, and research environments among countries.

Young people and people who use microblogs more frequently are more concerned about their microblog data, which is not consistent with the traditional view that such people do not care about privacy. Publishing information on a microblog is an active display behavior and represents consent to have their data browsed but not ‘collected, used and stored’, and it does not mean consent to reduced control over privacy. Frequent users are willing to share, but at the same time, they also value their rights to their own data and desire to be informed and respected. Researchers cannot apply traditional ideas to current Internet users and cannot use assumptions that they ‘do not care about privacy’ to presume consent to the use of microblog data for research.

With the development of the network environment, the privacy boundaries set by users in different environments are constantly changing, and the focus on the related ethical requirements is also changing in real time. In 2016, the European Economic and Social Committee (EESC) launched a big data ethics survey (‘The Ethics of Big Data: Balancing Economic Benefits and Ethical Questions of Big Data in the EU Policy Context’), which also stressed the need to consider the differences in stakeholders’ perceptions of ethical issues (EESC, 2016). However, researchers’ legitimate research needs cannot be ignored. The AoIR also advocates the use of flexible guidelines rather than rigid rules. It states,

when making ethical decisions, researchers must balance the rights of subjects (as authors, as research participants, as people) with the social benefits of research and researchers’ rights to conduct research. In different contexts, the rights of subjects may outweigh the benefits of research. (Markham and Buchanan, 2012)

Researchers who respect the different ethical needs of different groups can adjust their research methods, strike the proper balance between research interests and research risks, and effectively improve research efficiency.

Lack of norms makes it difficult for researchers and participants to form clear cognition

More than 81% of the respondents in our survey indicated that researchers cannot use microblogs without permission, but they did not have a clear understanding of the reasons. At present, researchers also have different understandings of this issue (Zimmer, 2010a). Sina stipulated in its Microblog Use Service Agreement, updated in 2017, that ‘Users may not assist or authorize any third party to illegally capture microblog content without their prior written permission from the microblog operator’.¹ This regulation, from the user’s point of view, places restrictions on the users themselves. The rule sparked widespread controversy, with many users saying it was ‘unfair’ and ‘unreasonable’ (Dou, 2017). Sina also laid out the following provisions for developers:

If user data must be collected for the developer application or services, consent from users must be obtained in advance. The developer should inform the user of the purpose, scope and usage of the relevant data collected to protect the user’s right to know.²

The provisions continued,

Developers should provide privacy protection policies to users of the microblog platform on the application and inform the microblog users which user data is being collected, how it will be used, whether it will be disseminated or submitted to others, etc. to protect the users’ rights to know and choice.

Such authorization is justified based on ‘access to developed applications’ and ‘provision of various services to users’. However, there is no clear regulation of the collection of microblog data by individuals or organizations involved in academic research.

From the legal perspective, microblog data cannot simply be considered public.

Currently, there is no clear, classified, and quantitative guide to identifying original work in China, and the originality of daily conversational blogs should not simply be denied (Liu, 2012). Although there are some exceptions in copyright law in China, there are also clear restrictions. Article 22 states that in classroom teaching or scientific research, the translation or small-scale reproduction of published works for the purposes of teaching or scientific research is an exception but also emphasizes that publication and distribution are not allowed. Therefore, there is no clear legal support for the unrestricted collection of microblogs for academic research.

It seems that the ethical issues surrounding social media data research have not attracted the attention of Chinese scholars. Another survey by our research team shows that among the 470 research papers using Sina Microblog data as research data, only one mentioned ethical issues (Chen et al., 2021b). The doctrine of the mean in traditional Chinese culture is a cultural factor that causes Chinese scholars to have weak ethical consciousness. The doctrine of the mean culture is an ideal harmonious cultural state. People value ‘harmony’ when addressing relations, maintain a harmonious status quo, and are unwilling to criticize. Influenced by the culture of the mean, there is a lack of critical atmosphere in academic circles. Compared with the clear problems of academic misconduct and academic fraud, the use of public data in scientific research is obviously a minor problem in the minds of Chinese scholars. Without criticism and supervision in academic circles, researchers will not pay special attention to ethical issues and will not adopt measures to solve ethical problems in research.

In the field of social media data research, new research methods and approaches to data collection have been developed. However, knowledge of the ethical guidelines for research using social media data is lacking, and there is currently neither consensus nor best practice in this field (Al Zou’bi et al., 2020; Vitak et al., 2016; Warrell and Jacobsen, 2014). There are no clear specifications for microblog platforms, and the lack of policies and researchers’ weak ethical awareness make it difficult to form consistent specifications for microblog research, placing both researchers and users in an awkward situation.

Strengthening data literacy education

The substantial value of social media data research, including research on microblogs, cannot be ignored. Banning such research may have serious consequences for the global scientific research system, which has entered the era of big data. Our survey shows that participants with different education levels have significant cognitive differences on ethical issues, and people with higher education have a clearer understanding of their own ethical needs. Therefore, in addition to passively informing participants, actively improving participants’ data literacy, enhancing participants’ awareness of their own data environment, and enhancing their understanding of the value and risk of Internet-mediated research can effectively increase their familiarity with social media data research. In 2018, the United Kingdom invested an additional £406 million in mathematics, digitization, and technical education (‘Industrial Strategy: Building a Britain Fit for the Future’). All countries should increase their investment in data education, design different curriculum systems for different objects, encourage common sense in data utilization in the basic education stage, and establish special training programs at the stage of university education to enable the next generation to master a basic commonsense approach to data utilization. It is necessary not only to improve researchers’ awareness of data utilization norms but also to enhance the public’s understanding of the Internet platform and Internet-mediated research so that both sides can establish a scientific concept of data ethics. Social education should also be linked with school education, give full play to the social education functions of community libraries and other institutions, popularize commonsense data utilization, and guide and standardize data dissemination as well as utilization behavior while improving the acceptance of digital society by the public, and special lectures should be held.

Conclusion

Principles of research ethics and the ethical treatment of persons have been codified in a number of policies and accepted documents, such as the UN Declaration of Human Rights, the Declaration of Helsinki, and the Belmont Report, each of which emphasizes the importance of respect, autonomy, protection, security, maximization of benefits, and minimization of harm. Research and data collection methods are constantly evolving, but the ethical guidelines and policy formulation for network data research remain nascent (Warrell and Jacobsen, 2014). When formulating social media data research norms and ethics guidelines, understanding the real attitudes of participants and analyzing the different ethical needs of different groups can help accurately grasp the needs of participants and formulate more targeted ethical standards.

As an exploratory study, this research initially examined participants’ perceptions of microblog research and found that there were differences in cognition of ethical issues among different groups. However, this study also has limitations. It lacks empirical data from the perspective of researchers, which this study will explore in its next phase. In addition, we intend to further investigate the influencing factors of users’ attitudes toward ethical issues in social media data research, including individual and environmental factors, and reveal the influencing mechanism of these factors to provide appropriate guidance for the formulation of ethical guidelines for social media data research from the users’ perspective.

Footnotes

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Yi Chen

Notes

Author biographies

Yi Chen is an associate professor at the School of Information Management, Wuhan University, China. Her research interests are data ethics and library development. Her research focuses on the new ethical challenges in the data-driven research environment.

Si Li is a postdoctoral fellow in the Department of Information Management at Beijing University, China. Her research interests are information policy and law and information behavior.

Ruoxuan He is a master student in the School of Information Management, Wuhan University, China. Her research interests are open access and intellectual property.

References

Al Zou’bi

Khatatbeh

Alzoubi

, et al. (2020) Attitudes and knowledge of adolescents in Jordan regarding the ethics of social media data use for research purposes. Journal of Empirical Research on Human Research Ethics 15(1–2): 87–96.

Barocas

Nissenbaum

(2014) Big data’s end run around anonymity and consent. In: Lane

Stodden

Bender

al.

(eds) Privacy, Big Data, and the Public Good: Frameworks for Engagement, pp. 116–141. Cambridge: Cambridge University Press.

Beaulieu

Estalella

(2012) Rethinking research ethics for mediated settings. Information, Communication & Society 15(1): 23–42.

Brown

Weilenmann

Mcmillan

, et al. (2016) Five provocations for ethical HCI research. In Proceeding of the 2016 CHI conference on human factors in computing systems, San Jose, CA, 7–12 May, pp.852–863. New York: ACM.

Bruckman

(2002) Studying the amateur artist: A perspective on disguising data collected in human subjects research on the Internet. Available at: https://www.cc.gatech.edu/classes/AY2003/cs6470_fall/bruckman-names.pdf (accessed 16 December 2021).

Bryce

Klang

(2009) Young people, disclosure of personal information and online privacy: Control, choice and consequences. Information Security Technical Report 14(3): 160–166.

Chen

(2021a) Determining factors of participants’ attitudes toward the ethics of social media data research. Online Information Review. Epub ahead of print 21 June. DOI: 10.1108/OIR-11-2020-0514.

Chen

Sun

(2021b) Ethical reflection on microblog data research. Information and Documentation Service 1: 50–56.

Dou

(2017) Who has the right to use microblog? China Intellectual Property News, 22 September, p.011.

10.

Ess

and AoIR Ethics Working Committee (2002) Ethical decision-making and Internet research: Recommendations from the AoIR ethics working committee. Available at: https://aoir.org/reports/ethics.pdf (accessed 16 December 2021).

11.

European Economic and Social Committee (EESC) (2016) The ethics of big data: Balancing economic benefits and ethical questions of big data in the EU Policy context. Available at: https://www.eesc.europa.eu/resources/docs/qe-02-17-159-en-n.pdf (accessed 16 December 2021).

12.

Fiesler

Lampe

Bruckman

(2016) Reality and perception of copyright terms of service for online content creation. In: Proceedings of the ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW). Available at: https://cmci.colorado.edu/~cafi5706/CSCW2016_Fiesler.pdf (accessed 16 December 2021).

13.

Fiesler

Proferes

(2018) ‘Participant’ perceptions of Twitter research ethics. Social Media + Society 3: 1–14.

14.

Floridi

Taddeo

(2016) What is data ethics? Available at: https://royalsocietypublishing.org/doi/pdf/10.1098/rsta.2016.0360 (accessed 16 December 2021).

15.

Franzke

Bechmann

Zimmer

, et al. (2019) Internet research: Ethical guidelines 3.0. Available at: https://aoir.org/reports/ethics3.pdf (accessed 16 December 2021).

16.

Hoy

Milne

(2010) Gender differences in privacy-related measures for young adult Facebook users. Journal of Interactive Advertising 10: 28–45.

17.

James

Busher

(2015) Ethical issues in online research. Educational Research & Evaluation 21: 89–94.

18.

Koene

Perez

Carter

, et al. (2015) Research ethics and public trust, preconditions for continued growth of internet mediated research: Public confidence in internet mediate research. In: 2015 International Conference on Information Systems Security and Privacy (ICISSP), Angers, 9–11 February, pp.163–168. New York: ACM.

19.

Lease

Hullman

Bigham

, et al. (2013) Mechanical Turk is not anonymous. Available at: https://dx-doi-org.web.bisu.edu.cn/10.2139/ssrn.2228728 (accessed 16 December 2021).

20.

Lin

Wang

(2015) An empirical analysis of users’ privacy disclosure behaviors on social network sites. Information & Management 52(7): 882–891.

21.

Litt

(2013) Understanding social network site users’ privacy tool use. Computers in Human Behavior 29: 1649–1656.

22.

Liu

(2012) Copyright of the works on the Microblogging Platform. Chinese Journal of Law 34(06): 119–130.

23.

Malhotra

Kim

Agarwal

(2004) Internet users’ information privacy concerns (IUIPC): The construct, the scale, and a causal model. Information Systems Research 15(4): 336–355.

24.

Markham

Buchanan

(2012) Association for Internet Researchers’ Ethics Working Committee – Ethical decision making and Internet research: Recommendations from the AoIR Ethics Working Committee (Version 2.0). Available at: https://aoir.org/reports/ethics2.pdf (accessed 16 December 2021).

25.

Martin

Shilton

(2015) Why experience matters to privacy: How context-based experience moderates consumer privacy expectations for mobile applications. Journal of the Association for Information Science and Technology 67(8): 1871–1882.

26.

May

Bayer

Ross

(2007) A survey of ‘young social’ and ‘professional’ users of location-based services in the UK. Journal of Location Based Service 1(2): 112–132.

27.

Mikal

Hurst

Conway

(2016) Ethical issues in using Twitter for population-level depression monitoring: A qualitative study. BMC Medical Ethics 17: 22.

28.

Milne

Rohm

(2000) Consumer privacy and name removal across direct marketing channels: Exploring opt-in and opt-out alternatives. Journal of Public Policy & Marketing 19(2): 238–249.

29.

Moreno

Goniu

Moreno

, et al. (2013) Ethics of social media research: Common concerns and practical considerations. Available at: https://www.liebertpub.com/doi/full/10.1089/cyber.2012.0334 (accessed 16 December 2021).

30.

Nissenbaum

(1997) Toward an approach to privacy in public: Challenges of information technology. Ethics & Behavior 7(3): 207–219.

31.

Omand

Miller

Bartlett

(2014) Towards the Discipline of Social Media Intelligence. Basingstoke: Palgrave Macmillan UK.

32.

Park

(2013) Digital literacy and privacy behavior online. Communication Research 40(2): 215–236.

33.

Phelps

Nowak

Ferrell

(2000) Privacy concerns and consumer willingness to provide personal information. Journal of Public Policy & Marketing 19(1): 27–41.

34.

Proferes

(2017) Information flow solipsism in an exploratory study of beliefs about Twitter. Social Media + Society. Available at: https://doi.org/10.1177/2056305117698493 (accessed 16 December 2021).

35.

Ravn

Barnwell

Barbosa

(2020) What is ‘Publicly available data’? Exploring blurred public–private boundaries and ethical practices through a case study on Instagram. Journal of Empirical Research on Human Research Ethics 15(1–2): 40–45.

36.

Ronn

Soe

(2019) Is social media intelligence private? Privacy in public and the nature of social media intelligence. Intelligence & National Security 34(3): 362–378.

37.

Sheehan

(1999) An investigation of gender differences in on-line privacy concerns and resultant behaviors. Journal of Interactive Marketing 13(4): 24–38.

38.

Sheehan

(2002) Toward a typology of internet users and online privacy concerns. The Information Society 18(1): 21–32.

39.

Shilton

Sayles

(2016) ‘We aren’t all going to be on the same page about ethics’: Ethical practices and challenges in research on digital and social media. Available at: http://terpconnect.umd.edu/~kshilton/pdf/ShiltonSaylesHICSS.pdf (accessed 16 December 2021).

40.

Special

Li-Barber

(2012) Self-disclosure and student satisfaction with Facebook. Computers in Human Behavior 28(2): 624–630.

41.

Swirsky

Hoop

Labott

(2014) Using social media in research: New ethics for a new meme? The American Journal of Bioethics 14(10): 60–61.

42.

Vaccaro

Karahalios

Sandving

, et al. (2015) Agree or cancel? Research and terms of service compliance. In: 2015 CSCW workshop on ethics for studying sociotechnical systems in a big data world. Available at: http://www-personal.umich.edu/~csandvig/research/Vaccaro-CSCW-Ethics-2015.pdf (accessed 16 December 2021).

43.

Vindu

(2014) As data overflows online, researchers grapple with ethics. The New York Times, 13 August. Available at: https://www.nytimes.com/2014/08/13/technology/the-boon-of-online-data-puts-social-science-in-a-quandary.html.

44.

Vitak

Shilton

Ashktorab

(2016) Beyond the Belmont principles: Ethical challenges, practices, and beliefs in the online data research community. In: ACM conference on computer-supported cooperative work & social computing, 27 February–2 March, San Francisco, CA, pp.941–953. New York: ACM.

45.

Wang

Zhu

Wen

(2013) Research on the relationship between user satisfactions and behaviors: The case of Sina-Microblog. China Soft Science 7: 184–192.

46.

Warrell

Jacobsen

(2014) Internet research ethics and the policy gap for ethical practice in online research settings. Canadian Journal of Higher Education 44(1): 22–37.

47.

Williams

Burnap

Sloan

(2017) Towards an ethical framework for publishing Twitter data in social research: Taking into account users’ views, online context and algorithmic estimation. Sociology 51(6): 1149–1168.

48.

Zhang

(2017) User community detection and community topic evolution based on the network characteristics of emergency. Information Studies: Theory & Application 40(5): 94–98.

49.

(2013) Research on predicting methods based on network user sentiment analysis. Journal of Library Science in China 39: 96–107.

50.

Yang

Chen

Zhang

, et al. (2014) Analysis of micro-blogging application in disease control and prevention. Chinese Journal of Health Education 30(9): 856–857.

51.

Zimmer

(2010a) ‘But the data is already public’: On the ethics of research in Facebook. Ethics and Information Technology 12(4): 313–325.

52.

Zimmer

(2010b) Is it ethical to harvest public Twitter accounts without consent? Available at: http://www.michaelzimmer.org/2010/02/12/is-it-ethical-to-harvest-public-twitter-accounts-without-consent/ (accessed 16 December 2021).

53.

Zimmer

(2015) Research ethics in big data era: Addressing conceptual gaps for researchers and IRBs. Available at: https://bigdata.fpf.org/wp-content/uploads/2015/12/Zimmer-Research-Ethics-in-the-Big-Data-Era.pdf (accessed 16 December 2021).

54.

Zimmer

Proferes

(2014) A topology of Twitter research: Disciplines, methods, and ethics. Aslib Journal of Information Management 66(3): 250–261.