Does open access citation advantage depend on paper topics?

Abstract

Research topics vary in their citation potential. In a metric-wise scientific milieu, it would be probable that authors tend to select citation-attractive topics especially when choosing open access (OA) outlets that are more likely to attract citations. Applying a matched-pairs study design, this research aims to examine the role of research topics in the citation advantage of OA papers. Using a comparative citation analysis method, it investigates a sample of papers published in 47 Elsevier article processing charges (APC)-funded journals in different access models including non-open access (NOA), APC, Green and mixed Green-APC. The contents of the papers are analysed using natural language processing techniques at the title and abstract level and served as a basis to match the NOA papers to their peers in the OA models. The publication years and journals are controlled for in order to avoid their impacts on the citation numbers. According to the results, the OA citation advantage that is observed in the whole sample still holds even for the highly similar OA and NOA papers. This implies that the OA citation surplus is not an artefact of the OA and NOA papers’ differences in their topics and, therefore, in their citation potential. This leads to the conclusion that OA authors’ self-selectivity, if it exists at all, is not responsible for the OA citation advantage, at least as far as selection of topics with probably higher citation potentials is concerned.

Keywords

APC citation advantage Green open access similarity subject

1. Introduction

Open access (OA) to scientific papers has widely been believed to have advantages for community, as readers, practitioners, researchers or the public. Facilitating knowledge transfer and, hence, knowledge progress is the key benefit of OA for all. It is, however, supposed to bring about an added value to scientists as writers, that is, a higher level of recognition by the scientific community, called open access citation advantage (OACA). The OACA, defined as the citation gap between OA and non-open access (NOA) paper groups, is widely confirmed for various OA models, that is, the Gold, Green, article processing charges (APC) and hybrid OA models [1 –18]. The research results are not consistent in that they sometimes reject the OACA, for example, at journal level [19], for the Gold model [15,20] and for low-impact journals [21]. However, a recent large-scale research conducted by Piwowar et al. [22] corroborates the existence of OACA for a big sample of papers intended to roughly proxy the whole population of scholarly literature.

The OACA is perceived to have roots in different factors. On one hand, some OA researchers attribute the OACA to the higher visibility brought about by their exposure to a broader readership [2] or to their early accessibility [9,19], while others challenge that it is the author’s self-selectivity of his or her high-quality papers that results in a higher recognition level. In other words, according to the latter, the citation advantage has roots in the intrinsic merits of the OA papers, and not necessarily in their open accessibility [23]. Some authors believe that more citable articles have a higher probability of being freely accessible [6,11,23 –26]. Hajjem and Harnad [27] showed that not only self-archived articles are more likely to be cited (‘Quality Advantage’), but also those articles which are more likely to be cited are more likely to be self-archived (‘Quality Bias’). Gargouri et al. [28] confirmed that the OA advantage is due to a ‘quality advantage’ judged by users who self-select what to use and cite, rather than a ‘quality bias’ from authors self-selecting what to make OA. Some other factors include journals’ prestige [29 –31], paper length and the number of collaborating authors [11], though not confirmed in all studies [32]. In spite of the widespread studies on the OACA, there is not yet any certainty about the causation of the phenomenon. As instance, ‘the early view postulate’ is challenged by some studies confirming the OACA persistence over time [33,34]. The impact of ‘self-selection’ bias is also rejected by Ottaviani [18]. The debates are, hence, ongoing and researchers call for further studies to clarify the causation of the OACA effect [22].

Given the high and increasing number of national and international OA mandates [35 –37], one might think that all authors have to apply for open accessibility of their outputs with no freedom of choices between OA and NOA models. However, it is not the case, because the mandates have faced important barriers, for example, lack of strong incentives [37], inflexibility of embargo duration, low participation of authors [37,38] and limitation of the OA models included [36,39], hindering their wide acceptance and effectiveness. The OA articles are, hence, ‘almost all unmandated’ around the world, although effective in some countries like the United Kingdom [40].

Aside from the above-mentioned factors, the subject area of papers is determining in their citation potential [41 –43]. Citation potential differs not only between but also within disciplines and fields. More specifically, citations are related to the topic the paper deals with. As Falagas and Alexiou [44] put it, even within a certain discipline there are some citation-intensive areas attracting more – though rather perfunctory – citations. ‘Hot’ topics [45], exciting and popular topics [46,47] and positive outcomes [48] are among the content-related factors that play a role in citation performance of papers. Research topic is believed to be determining in predicting its recognition to such extent that it is a matter of importance of writing highly cited papers [42,43].

As a result, authors and journals may adopt the approach – either tactically or strategically – to increase their visibility and hence their impact. High research interest and popularity may serve as a drive for authors, given the increasing role and influence of citation as a research performance measure in milestones of their scientific lives, especially promotion, tenure and grant allocation [49]. In Rousseau and Rousseau’s [50] words, in a ‘metric-wise’ milieu, research topics and publication avenues may be selected to maximise a researcher’s bibliometric indicator levels rather than out of a desire to advance science or to reach the most interested audience. It is, therefore, likely that choosing a topic for research or a venue for publicising research results may be a function of its citation potential. This may be especially true for OA venue, which is confirmed to be more citation attractive, so that multiple OA availability of articles has a positive impact on their citation count [4]. The wide discussions on the factors leading to the OACA may be good evidence of the perceived interaction between attention-provoking topics and outlets in catching more citations. On this basis, it is likely that the ‘selection bias’ implies the selection of not only high-quality papers but also citation-attractive topics.

This gives rise to the question of whether the citation gap between OA and NOA papers results from authors’ self-selectivity in choosing citation-attractive topics when adopting the APC model or self-archiving their accepted papers. If so, it would be expected that OA and NOA papers experience no citation inequality when dealing with similar topics and hence enjoying the same level of citation potential. To answer the question, this study tries to pair OA and NOA papers in terms of their subject similarity and then examine the association between their OACA and similarity degrees. As far as the OACA-oriented literature goes, it seems that subject coverage is taken into consideration by focusing on the papers published either in the same journal [14,25,51 –53] or in the same discipline [6,21,30,54 –57]. Piwowar et al. [22] categorised papers based on their publishing journal fields, except for those published in multidisciplinary journals that they classified at the article level to their mostly cited subject area. Unlike journal fields and disciplines that are broad, papers’ topics are more specific and therefore more prone to reflect authors’ interests and preferences. No research was, however, found to have controlled for the effect of specific topics dealt with in OA and NOA papers. As an exception, one may refer to Piwowar and Vision [8] who controlled for many citation predictors, including authors and topics. However, they concentrated on the impact of openly available data in a narrow field, that is, gene expression microarray. Moreover, Gargouri et al. [28], Niyazov et al. [58] and Snijder [59] controlled for the field effect at general level. Besides, in almost the entire literature, the OA and NOA papers were collectively analysed, while a matched-pairs design would take any probable individual differences between papers into consideration. As Piwowar et al. [22] suggest, concentrating on articles paired based on their topics, journal issues and publication dates may help eliminate confounding factors and hence judge the OACA more accurately.

2. Research aims

Applying a matched-pairs study design, this study aims to examine the role of research topics in the citation gap between OA and NOA papers. To do this, it endeavours to determine the following:

The citation gap between NOA papers and their subject peers in the OA models;

The significance of the citation gap across OA models;

The significance of the citation gap across OA models in different subject similarity groups.

3. Research method

This research uses a comparative citation analysis method to investigate a test collection sampled from 47 Elsevier APC-funded journals previously identified by Sotudeh et al. [13]. The rationale to select the journals lies in the fact that they had been previously found to have a growing body of APC and self-archived papers with considerable OACA [13,60]. Moreover, according to the SHERPA/RoMEO database, all of them were revealed to have adopted the Green (GR) model also. The model allows authors to self-archive their papers (in its pre-print, reviewed or post-print versions) on their personal or organisational websites. This allowed to concentrate on both APC and Green models and their interaction, that is, a mixed model of APC and Green OA. Not only OA models are different in their OACA potential [15,61], but also the citation count of an article is positively influenced by its multiple availability by, for example, multiple search engines [4], repositories [23] and databases [62].

3.1. Sampling

The time span of 2013–2015 was selected. The collection was limited to research articles, reviews and proceeding papers. Out of the 50,080 papers identified by Sotudeh et al. [13], 42,226 papers were available with full bibliometric information (including abstracts). After downloading the bibliometric and bibliographic information of the papers from Scopus, the access model of each paper was investigated as described below.

3.1.1. Identification of APC papers

The data collection process was started in January 2016. Off-campus searches were conducted using the papers’ titles and authors. The searches were conducted off campus in order to avoid subscription-based papers entering the collection. The papers were classified as APC if they contained the terms ‘Open Access Article’ and ‘Creative Commons’ tags and were freely available for downloading to public. Furthermore, the URLs of the papers were checked in terms of including ‘Open Archive’ in order to prohibit the delayed OA papers from entering the collection.

3.1.2. Identification of GR OA papers

In the next step, the papers were manually searched in Google and Google Scholar by their titles and authors in order to find if they were self-archived. The rationale for selecting the search engines lies in their advantages including offering a reasonably comprehensive coverage of scholarly papers [63], ranking OA at the top of their results retrieved and exhibiting similar or even higher performance in retrieving OA results compared with Directory of Open Access Journals (DOAJ), Science-Metrix and Unpaywall [35,64,65].

As the search engines rank OA items at the top of the results, only the first and second result pages were verified. Each of the links retrieved was tried in order to check the online availability of the papers’ full texts.

The results of the verification of the papers’ access models are summarised in Table 1. As observed, the NOA model expectedly comprises most of the papers published by the journals (58.13%), followed by the Green model (34.83%). The lowest share belongs to the APC model, in its pure format (2.23%) or mixed with the Green model (APC-Green or APCGR; 4.81%). The latter refers to those papers that are not only openly accessible through the publisher’s or the journal’s website in exchange for the publication charges paid by their authors, but also archived on websites and social networks. The adherence of a single paper to more than one OA model can be explained by the fact that all of the studied hybrid journals allow self-archiving model, though with different policies regarding the archived versions and the length of the embargoes [66]. The payment of the APC, despite the allowance of the Green model, could be due to OA mandates or authors’ unawareness of the other OA models allowed [67,68]. Moreover, the Green versions may be posted by co-authors or third parties, for example, librarians, enthusiastic readers and so on, who are not necessarily responsible for paying the publication charges.

Table 1.

Characteristics of the initial data set.

Access models	Sample
	No.	Percent
APC	940	2.23
APCGR	2033	4.81
GR	14,707	34.83
NOA	24,546	58.13
Total	42,226	100.00

APC: article processing charges; GR: Green; NOA: non-open access.

It is worth mentioning that the total portion of the OA papers (41.87%) is lower than what was previously reported by Archambault et al. [15] who reported over half of the papers to be freely available. This could have resulted from the fact that the present sample is limited to APC and Green papers published in hybrid journals and does not include other OA models, for example, those articles published in Gold journals or the Green papers published in toll-access ones. As mentioned, Bronze and delayed OA models are also excluded from the sample. Another noteworthy point is that the share of the Green papers seems to be higher in comparison with that revealed by Piwowar et al. [22]. The discrepancy may be attributed to the differences between operational definitions. Unlike their study, Green papers in this research are not limited to just those self-archived in OA repositories, but also in personal or institutional websites as well as social networks. They also gave priority to Gold over self-archived content, if an article adhered to both the models. Moreover, in their study, the ‘hybrid papers’, that is, the APC-funded papers published in hybrid journals, account for 3.6%, 4.3% and 8.3% of their three samples selected from CrossRef, WoS and Unpaywall, respectively. As observed, the share of the APC papers identified in this study (7.04%) approximates that of those identified in their latter sample.

The initial data set was then deeply analysed in order to build a test collection consisting of NOA documents, which served as queries, paired with documents in the OA models in terms of their subject similarity. To do so, 2134 NOA papers were selected based on containing salient keywords in the collection based on their term frequency–inverse document frequency (TF-IDF). They served as seed documents and were then paired to their OA peers as described below.

3.1.3. Test collection building

In order to study the association between the subject similarity and the citation gap between the OA and NOA papers, it was necessary to first pair the papers in terms of their similarities in their contents. To do so, the documents’ titles and abstracts were chosen as highly important representations of paper contents [69,70] because they are ‘lexically dense and focus on the core issues presented in articles’ (p. 4) [71].

KNIME (Kostas Information Mining) was used to measure the similarity between the OA and NOA papers and to couple them. The software uses different steps to process texts including reading and parsing documents, named entity recognition, filtering and manipulation, word counting and keyword extraction, and measuring the similarities between the seed documents and the collection.

A workflow consisting of the following nodes (modules) was built and executed, respectively:

No.	Function	Related node
1	Reading the Excel file and extracting the strings of titles and abstracts	Excel Reader
2	Transforming the extracted strings into a field called ‘Document’	Strings to document
3	Preprocessing	Diacritics Remover; Punctuation Erasure; Stop word filter; N chars filter; Case convertor; Snowball stemmer
4	Transforming the ‘Document’ into bag of words	Bag of Words creator
5	Calculating the weight of each term based on TF-IDF (relative TF was computed by dividing the absolute frequency of a term according to a document by the number of all terms of that document)	TF
		IDF
		Math Formula (TF-IDF)
6	Ranking the terms based on their TF-IDF values	Rank Row filter
7	Identifying and filtering the NOA documents with rank < 10,000	Row Filter
8	Finding the most similar documents to each of the NOA papers using cosine measure and K nearest neighbours (K = 5)	Similarity Search

The obtained data were checked manually in order to control for any false-positive matching or duplicated records. This resulted in identifying 4171 unique documents in the four access models including NOA (2134), APC (75), GR (1305) and APCGR (122) (Table 2).

Table 2.

Test collection characteristics.

	Access model	No. of papers	No. of pairs
Seed documents	NOA	2134	–
Neighbours	APC	75	148
	APCGR	122	237
	GR	1305	2762
	NOA	535	778
Total		4171	3925

NOA: non-open access; APC: article processing charges; GR: Green.

It is noteworthy that some NOA papers matched more than one document in the OA models and vice versa. As the models were analyzed separately, it caused no problem when this happened between groups. However, within the NOA–NOA group, the seed and neighbour documents were necessarily mutually exclusive in order to observe the independence of the variables.

3.2. Data analysis procedures

As the citation and similarity values were found to be non-normally distributed across the papers, even after natural log transformation, non-parametric statistics (i.e. Wilcoxon test) was used for comparing the citations of OA and NOA papers.

Using the two-step clustering technique, the papers were categorised into three groups of subject similarity including slightly similar (min = 0.218, max = 0.586, cluster centre = 0.46), moderately similar (min = 0.587, max = 0.849, cluster centre = 0.71) and highly similar (min = 0.854, max = 1, cluster centre = 0.99).

Before conducting the analyses, the citation data were normalised based on the total number of the citations each journal has received in each year following Priem et al. [72]. However, as Waltman and van Eck [73] put it, there are several normalisation approaches and the best approach is still an open question. Consequently, to allay any probable hesitations regarding the normalisation, the highly similar OA–NOA pairs were further studied in a strict categorisation based on the similarity of the confounding factors affecting the citation potential of papers. Publication dates, publishing journals and document types (e.g. reviews, research articles, proceeding papers) are among the confounding factors causing a higher number of citations in favour of, for example, older versus younger papers [74,75], reviews versus original research papers [26] and top- versus low-ranked journals [76,77]. The highly similar OA–NOA pairs were further compared after categorising them into ‘Evens’ and ‘Odds’ if their document types, publication years or publishing journals were similar or dissimilar, respectively.

4. Results

Table 3 illustrates the citation performance of the OA and NOA papers in different access models before and after being normalised by their publication years and journals’ total citations. As observed, the OA papers are superior in their mean citation values in all matched-pairs groups, compared with their peers in the NOA model.

Table 3.

Citation performance of OA and NOA papers across the access models.

Mean citation	Model	Pairs
		APC–NOA	APCGR–NOA	GR–NOA
Before normalisation	OA	5.44	7.50	6.91
	NOA	4.66	5.48	4.73
After normalisation	OA	1.268	1.263	1.079
	NOA	1.02	0.953	0.973

OA: open access; NOA: non-open access; APC: article processing charges.

4.1. Significance of the citation gap across the access models

Table 4 summarises the results of Wilcoxon test conducted to compare the OA and NOA pairs in terms of their citation mean ranks. As observed, the mean ranks of the OA papers are significantly higher than those of their NOA peers in the OA models, including APC (‘OA > NOA’ = 70.06), APCGR (‘OA > NOA’ = 112.36) and GR (‘OA > NOA’ = 1348.64). Besides, the NOA–NOA pairs do not significantly differ in their citation gaps.

Table 4.

Wilcoxon results for comparing citation mean ranks of OA and NOA papers.

Access models		N	Mean rank	Asymp. Sig.	Z
APC	OA < NOA	52	68.58	0.009	−2.613
	OA > NOA	86	70.06
	OA = NOA	10	–
	Total	148	–
APCGR	OA < NOA	93	109.13	0.026	−2.224
	OA > NOA	128	112.36
	OA = NOA	16	–
	Total	237	–
GR	OA < NOA	1175	1292.03	0.000	−5.925
	OA > NOA	1471	1348.64
	OA = NOA	116	–
	Total	2762	–
NOA	query < neighbour	361	356.85	0.931	−0.087
	query > neighbour	355	360.18
	query = neighbour	62	–
	Total	778	–

OA: open access; NOA: non-open access; APC: article processing charges; GR: Green.

4.2. Significance of the citation gap across the access models in the subject similarity groups

Table 5 illustrates the results of the Wilcoxon tests carried out to compare the citation advantage values among the paper pairs in different access models in different similarity groups. As observed, the OA–NOA pairs with ‘slightly similar’ contents are equal in their citation performances, so that there are no statistically differences between the APC, APCGR and GR papers on one hand and their NOA papers in the same pairs on the other. However, when the similarity increases, in the ‘moderately similar category’ the citation gap becomes significant between GR and NOA papers (OA > NOA = 134,480.5 vs OA < NOA = 100,474.5). Moreover, in the ‘highly similar’ groups, all OA models outperform the NOA model, so that significant OACA values can be observed for the APC (OA > NOA = 2505 vs OA < NOA = 1150), APCGR (OA > NOA = 7198.5 vs OA < NOA = 4736.5) and GR (OA > NOA = 587,793 vs OA < NOA = 787,519). In this similarity group, NOA–NOA pairs showed no significant gap between their citation mean ranks. It is worth mentioning that the size of the sample is small for the APC and APCGR papers, especially in the slightly and moderately similar groups and the results should be interpreted with caution.

Table 5.

Comparing normalised citation mean ranks of OA and NOA papers in similarity groups.

Access model		Slightly similar					Moderately similar					Highly similar
Access model		N	Mean rank	Sum of ranks	Z	Sig.	N	Mean rank	Sum of ranks	Z	Sig.	N	Mean rank	Sum of ranks	Z	Sig.
APC	OA < NOA	5	11.00	55	–0.672	0.5	19	17.16	326	–0.385	0.7	28	41.07	1150	–2.969	0.003
	OA > NOA	11	7.36	81			18	20.94	377			57	43.95	2505
	OA = NOA	0	–	–			2	–	–			8	–	–
	Total	16	–	–			39	–	–			93	–	–
APCGR	OA < NOA	12	11.00	132	–1.008	0.3	19	22.63	430	–1.418	0.156	62	76.40	4736.5	–2.221	0.026
	OA > NOA	8	9.75	78			28	24.93	698			92	78.24	7198.5
	OA = NOA	0	–	–			4	–	–			12	–	–
	Total	20	–	–			51	–	–			166	–	–
GR	OA < NOA	142	155.99	22,150	–0.575	0.6	292	344.09	100,474.5	–3.282	0.001	741	793.24	587,793	–5.122	0.000
	OA > NOA	161	148.48	23,906			393	342.19	134,480.5			917	858.80	787,519
	OA = NOA	19	–	–			31	–	–			66	–	–
	Total	322	–	–			716	–	–			1724	–	–
NOA	query < neighbour	40	40.41	1617	–0.178	0.9	98	96.84	9490.5	–0.294	0.769	223	220.13	49,089	–0.195	0.846
	query > neighbour	39	39.58	1544			94	96.14	9037.5			222	225.88	50,146
	query = neighbour	7	–	–			14	–	–			41	–	–
	Total	86	–	–			206	–	–			486	–	–

OA: open access; NOA: non-open access; APC: article processing charges; GR: Green.

4.3. Significance of the citation gap between highly similar OA and NOA papers in terms of year, document type and journal groups

Aside from topics, there are other publication factors such as publication years, publishing journals and document types that have been found effective in accumulating citations. This gives rise to the question of whether the significant OACA in the highly similar group is due to their differences in these regards. To answer this question, the OA–NOA pairs were categorised into two groups including ‘Evens’ if the OA–NOA pairs are published in the same document types, years or journals and ‘Odds’ if not so. The results of the tests carried out on the pairs with ‘highly similar topics’ are reported in Table 6. The analyses of the pairs with ‘slightly and moderately similar subjects’ are presented in Appendix 1. It is noteworthy that, in order to vet the role of publication years and journals, it was necessary to carry out the tests on raw citations (i.e. before normalisation by journals’ total citations and publication years).

Table 6.

Comparing citation mean ranks of highly similar OA and NOA papers in even/odd publication factor groups.

Publication factors	Access model		Even					Odd
Publication factors	Access model		N	Mean rank	Sum of ranks	Z	Sig.	N	Mean rank	Sum of ranks	Z	Sig.
Years	OA–NOA	OA < NOA	239	260.76	62,322	–4.486	0.000	487	582.58	283,718	–8.445	0.000
		OA > NOA	325	298.49	97,008			764	653.68	499,408
		OA = NOA	95	–	–			73	–	–
		Total	659	–	–			1324	–	–
	NOA–NOA	query < neighbour	77	67.86	5225	–.010	0.992	133	136.44	18,147	–.424	0.671
		query > neighbour	67	77.84	5215.0			140	137.53	19,254.0
		query = neighbour	43	–	–			26	–	–
		Total	187	–	–			299	–	–
Journals	OA–NOA	OA < NOA	330	363.95	120,102.5	–4.75	0.000	396	476.72	188,783	–8.542	0.000
		OA > NOA	443	404.17	179,048.5			646	548.95	354,620
		OA = NOA	80	–	–			88	–	–
		Total	853	–	–			1130	–	–
	NOA–NOA	query < neighbour	82	76.37	6262.5	–0.57	0.57	128	127.23	16,285.5	–0.03	0.97
		query > neighbour	80	86.75	6940.5			127	128.77	16,354.5
		query = neighbour	30	–	–			39	–	–
		Total	192	–	–			294	–	–
Document types	OA–NOA	OA < NOA	612	703.26	430,400.5	–7.571	0.000	114	142.95	16,296	–5.946	0.000
		OA > NOA	879	775.75	681,885.5			210	173.11	36,354
		OA = NOA	155	–	–			13	–	–
		Total	1646	–	–			337	–	–
	NOA–NOA	query < neighbour	186	178.21	33,146.5	–0.215	0.830	24	26.35	632.5	–0.286	0.775
		query > neighbour	180	188.96	34,014.5			27	25.68	693.5
		query = neighbour	63	–	–			6	–	–
		Total	429	–	–			57	–	–

OA: open access; NOA: non-open access.

As shown in Table 6, the OA papers significantly outperform their highly similar NOA peers no matter if they are published in the same or different years, journals or document types. The same holds for the OA–NOA pairs with ‘slightly and moderately similar subjects’ (Appendix 1).

5. Discussion

Researchers seem to follow a double standard in supporting OA movement. As readers, they enthusiastically campaign for open accessibility of scientific papers to read and use, while as authors they do are not willing to do their best in providing OA to their own papers [78,79]. The paradox calls, hence, for motivations other than altruism and commitment to knowledge progress, which are not per se adequate to guarantee writers’ support for the movement. In the present metric-wise atmosphere, requiring authors to increase the quantity of their papers and citations, the citation superiority of OA papers may serve as a leverage to stimulate their survival motivations.

Although the citation advantage was confirmed since the early works on OA [80], there is no consensus on its underlying factors including higher visibility, higher quality, journals’ prestige and publication characteristics [2,6,11,23,25,26,28 –31,81]. Since papers’ subjects are a key factor in attracting citations, one may wonder if the OACA is nothing but a misunderstanding caused by OA and NOA papers’ differences in their topics and hence in their citation potentials.

The present communication concentrated on a sample of OA and NOA papers paired in terms of their similarity degrees. According to the results of the Wilcoxon tests, NOA papers exhibited significant disadvantage compared with their OA peers in each of the Green, APC and APC–Green models. However, the result does not hold for the NOA–NOA couples which served as the evidence group (Table 4).

The comparison of OA and NOA papers at different similarity levels provides an opportunity to verify the OACA phenomenon at a finer granularity. While OA–NOA couples dealing with almost different topics showed statistically equal performances, those with highly similar topics revealed a citation advantage for OA papers (Table 5). In fact, it seems that in the groups with lower levels of subject similarity the citation potential brought about by open accessibility is counterbalanced by the subjects’ differences in their citation potentials. This empirically re-confirms the already-known fact that subjects and topics are determining in attracting citations to papers [41 –44,48,82,83].

On the contrary, there exist significant gaps among the OA–NOA pairs dealing with highly similar subjects in all the OA models. However, the NOA–NOA pairs with highly similar contents do not adhere to the finding. This means that the OACA is not an artefact caused by different topics with various citation potentials for the OA and NOA papers. Nor it is associated with (dis)similarity in their publication factors, because the highly similar OA–NOA pairs were also detected to be significantly different, no matter if they are published in the same or different years, journals or document types. These factors are believed to be crucially determining when evaluating and comparing papers [72,84]. The higher citation performance of the OA in comparison with their subject-similar NOA pairs with different publication characteristics signifies that subject similarity is so powerful to counterbalance the effect of publication in different journals, publication time and document types.

Paper quality is of a multi-dimensional and highly complicated nature. Although all quality dimensions are not explained by the topic or subject a paper deals with, topic characteristics such as topic importance [85] and topic coverage and detailedness [86] crucially affect the judgement of a paper quality. Characteristics of research topics are believed to be determining in predicting their citation potentials. A good example may be observed in ‘hot’ topics, which are believed to easily acquire more citations and more papers than those dealt with in cold fields, as there are more papers focusing on similar topics [45]. Among other instances, to name, there are exciting and popular topics [46,47], controversial topics versus those contributing to scientific progress [83], fundamental versus super-specialised subjects with narrower audience [82] and finally positive and statistically significant versus negative outcomes [48]. Accordingly, it seems that important subject matters may acquire fewer citations, while popular, hot or trivial topics are more likely to gain more citations. Given the ‘self-selectivity postulate’, it would be probable that the OACA is a resultant of authors’ selectivity of topics with a high citation potential. However, the results of this study indicated that OACA is still observable among OA and NOA papers dealing with highly similar topics and hence enjoying the same level of citation potential. As a result, this article may contribute to the ongoing challenge of OA ‘quality bias’ versus ‘quality advantage’ by clarifying that the OACA is not brought about by topic differences among OA and NOA papers. Accordingly, even if authors intentionally select these kinds of citation-attracting topics for their OA papers, they would experience a higher citation performance compared with NOA papers dealing with the same topics.

6. Conclusion

OACA not only is a motive to drive authors to support the OA movement, but can also be interpreted as a herald of acceleration and expansion of knowledge dissemination and usage. It may, therefore, be important to all parties being involved in science. OA researchers have been continually endeavouring to provide rigorous evidence to ensure whether the OACA is an artefact of ‘publication strategies’ (e.g. longer paper, multiple authorship and self-selectivity of high-quality papers) expediently adopted by authors to increase their recognition or a natural consequence of open accessibility leading to higher visibility and wider readership. Concentrating on a collection of OA papers having citation superiority to their NOA couples, the present communication adds to the existing knowledge that although topics differ in their citation potentials, the OACA does not result from OA papers’ differences in topics. It also re-confirms the effect of publication years, journal prestige and document types on the recognition of papers. However, they are not found to be influential in the OACA.

Although this research sampled the OA papers in terms of the degrees of their content similarity to the NOA papers, the topics themselves were not studied. The scope of the topics might be diverse, widespread and divergent. Consequently, the results imply just the existence of OACA among OA and NOA papers with the same subjects. However, from the results, one cannot infer how various topics with different degrees of importance, popularity and influence are scattered among the OA and NOA papers and how these characteristics associate with the OACA. Further studies are required to dig deep into topics’ distributions among OA and NOA models to answer these questions. Moreover, the research communication focused on a relatively small sample of 47 hybrid OA journals published by Elsevier, one of the largest commercial publishers of highly prestigious journals, especially APC-funded ones. To generalise the results, it is required to replicate the research on journals and publishers with different prestige levels, sizes and histories. The study had another limitation regarding sample size, especially for the slightly and moderately similar papers, which requires cautious interpretation, as well as replication of the study on a larger sample to generalise the results.

Footnotes

Appendix

Appendix 1

The Wilcoxon results for comparing citation mean ranks of slightly and moderately similar subject OA and NOA papers in even/odd publication factor groups.

Publication factor	Access model			Slightly similar					Similar
				N	Mean rank	Sum of ranks	Z	Sig.	N	Mean rank	Sum of ranks	Z	Sig.
Publication year	Even	NOA–NOA	query < neighbour	11	12.59	138.5	−0.33	0.74	29	25.47	738.5	−0.98	0.33
			query > neighbour	13	12.42	161.5	–	–	21	25.55	536.5	–	–
			query = neighbour	8	–	–	–	–	18	–	–	–	–
			Total	32	–	–	–	–	68	–	–	–	–
	Odd		query < neighbour	25	21.5	537.5	−0.28	0.78	52	65.83	3423	−1.81	0.07
			query > neighbour	22	26.84	590.5	–	–	77	64.44	4962	–	–
			query = neighbour	7	–	–	–	–	9	–	–	–	–
			Total	54	–	–	–	–	138	–	–	–	–
	Even	OA–NOA	OA < NOA	43	48.19	2072	−0.93	0.35	92	95.42	8779	−3.20	0.00
			OA > NOA	53	48.75	2584	–	–	124	118.2	14,657	–	–
			OA = NOA	17	–	–	–	–	40	–	–	–	–
			Total	113	–	–	–	–	256	–	–	–	–
	Odd		OA < NOA	71	111.42	7910.5	−4.66	0	185	242.25	44,817	−5.75	0
			OA > NOA	151	111.54	16,842.5	–	–	319	258.44	82,443	–	–
			OA = NOA	23	–	–	–	–	46	–	–	–	–
			Total	245	–	–	–	–	550	–	–	–	–
Journal	Even	NOA–NOA	OA < NOA	14	15.11	211.5	−0.44	0.66	24	27.29	655	−1.17	0.24
			OA > NOA	16	15.84	253.5	–	–	32	29.41	941	–	–
			OA = NOA	5	–	–	–	–	7	–	–	–	–
			Total	35	–	–	–	–	63	–	–	–	–
	Odd		query < neighbour	22	19.2	422.5	−0.10	0.92	57	62.6	3568	−0.62	0.54
			query > neighbour	19	23.08	438.5	–	–	66	61.48	4058	–	–
			query = neighbour	10	–	–	–	–	20	–	–	–	–
			Total	51	–	–	–	–	143	–	–	–	–
	Even	OA–NOA	OA < NOA	37	47.85	1770.5	−2.19	0.03	94	115.41	10,848.5	−5.28	0
			OA > NOA	60	49.71	2982.5	–	–	169	141.23	23,867.5	–	–
			OA = NOA	9	–	–	–	–	26	–	–	–	–
			Total	106	–	–	–	–	289	–	–	–	–
	Odd		OA < NOA	77	109.45	8427.5	−4.04	0	183	219.96	40,253.5	−4.28	0
			OA > NOA	144	111.83	16,103.5	–	–	274	235.03	64,399.5	–	–
			OA = NOA	31	–	–	–	–	60	–	–	–	–
			Total	252	–	–	–	–	517	–	–	–	–
Document type	Even	NOA–NOA	query < neighbour	34	32.57	1107.5	−0.01	0.99	73	82.1	5993	−1.14	0.25
			query > neighbour	32	34.48	1103.5	–	–	90	81.92	7373	–	–
			query = neighbour	15	–	–	–	–	23	–	–	–	–
			Total	81	–	–	–	–	186	–	–	–	–
	Odd		query < neighbour	2	1.5	3	−1.21	0.23	8	7.62	61	−0.36	0.72
			query > neighbour	3	4	12	–	–	8	9.38	75	–	–
			query = neighbour	0	–	–	–	–	4	–	–	–	–
			Total	5	–	–	–	–	20	–	–	–	–
	Even	OA–NOA	OA < NOA	103	140.68	14,490.5	−4.15	0	250	307.85	76,963	−6.29	0
			OA > NOA	181	143.53	25,979.5	–	–	405	340.44	137,877	–	–
			OA = NOA	38	–	–	–	–	82	–	–	–	–
			Total	322	–	–	–	–	737	–	–	–	–
	Odd		OA < NOA	11	16.86	185.5	−1.91	0.05	27	25.98	701.5	−2.42	0.02
			OA > NOA	23	17.8	409.5	–	–	38	37.99	1443.5	–	–
			OA = NOA	2	–	–	–	–	4	–	–	–	–
			Total	36	–	–	–	–	69	–	–	–	–

OA: open access; NOA: non-open access.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship and/or publication of this article.

ORCID iD

Hajar Sotudeh

References

Antelman

Do open-access articles have a greater research impact?

College Res Libr 2004; 65(5): 372–382.

Harnad

Brody

. Comparing the impact of open access (OA) vs. non-OA articles in the same journals. D-Lib Mag 2004; 10(6): 260207.

McVeigh

ME.

Open access journals in the ISI citation databases: analysis of impact factors and citation patterns: a citation study from Thomson Scientific. Philadelphia, PA: Thomson Scientific, 2004, http://science.thomsonreuters.com/m/pdfs/openaccesscitations2.pdf (accessed 14 February 2017).

Xia

Lynette Myers

Kay Wilhoite

Multiple open access availability and citation impact. J Inform Sci 2011; 37(1): 19–28.

Breugelmans

Roberge

Tippett

, et al. Scientific impact increases when researchers publish in open access and international collaboration: a bibliometric analysis on poverty-related disease papers. PLoS ONE 2018; 13(9): e0203156.

Davis

Fromerth

Does the arXiv lead to higher citations and reduced publisher downloads for mathematics articles?

Scientometrics 2007; 71(2): 203–215.

Metcalfe

TS.

The citation impact of digital preprint archives for solar physics papers. Solar Phys 2006; 239(1–2): 549–553.

Piwowar

Vision

TJ.

Data reuse and the open data citation advantage. PeerJ 2013; 1: e175.

Eysenbach

Citation advantage of open access articles. PLoS Biol 2006; 4(5): e157.

10.

Eysenbach

The open access advantage. J Med Internet Res 2006; 8(2): e8.

11.

Davis

PM.

Author-choice open access publishing in the biological and medical literature: a citation analysis. J Am Soc Inform Sci Technol 2009; 60(1): 3–8.

12.

Riera

Aibar

Does open access publishing increase the impact of scientific articles? An empirical study in the field of intensive care medicine. Med Intens 2013; 37(4): 232–240.

13.

Sotudeh

Ghasempour

Yaghtin

The citation advantage of author-pays model: the case of Springer and Elsevier OA journals. Scientometrics 2015; 104(2): 581–608.

14.

Wang

Liu

Mao

, et al. The open access advantage considering citation, article usage and social media attention. Scientometrics 2015; 103(2): 555–564.

15.

Archambault

Amyot

Deschamps

, et al. Proportion of open access papers published in peer-reviewed journals at the European and world levels – 1996–2013, http://science-metrix.com/sites/default/files/science-metrix/publications/d_1.8_sm_ec_dg-rtd_proportion_oa_1996-2013_v11p.pdf

16.

Nelson

Eggett

DL.

Citations, mandates, and money: author motivations to publish in chemistry hybrid open access journals. J Assoc Inform Sci Technol 2017; 68(10): 2501–2510.

17.

Mikki

Scholarly publications beyond pay-walls: increased citation advantage for open publishing. Scientometrics 2017; 113(3): 1529–1538.

18.

Ottaviani

The post-embargo open access citation advantage: it exists (probably), it’s modest (usually), and the rich get richer (of course). PLoS ONE 2016; 11(8): e0159614.

19.

Dorta-González

Santana-Jiménez

Prevalence and citation advantage of gold open access in the subject areas of the Scopus database. Res Eval 2017; 27(1): 1–5.

20.

Dorta-González

González-Betancor

Dorta-González

MI.

Reconsidering the gold open access citation advantage postulate in a multidisciplinary context: an analysis of the subject categories in the Web of Science database 2009–2014. Scientometrics 2017; 112(2): 877–901.

21.

McCabe

Snyder

CM.

Identifying the effect of open access on citations using a panel of science journals. Econ Inq 2014; 52(4): 1284–1300.

22.

Piwowar

Priem

Larivière

, et al. The state of OA: a large-scale analysis of the prevalence and impact of open access articles. PeerJ 2018; 6: e4375.

23.

Wren

JD.

Open access and openly accessible: a study of scientific publications shared via the internet. BMJ 2005; 330(7500): 1128.

24.

Kurtz

Eichhorn

Accomazzi

, et al. The effect of use and access on citations. Inform Process Manage 2005; 41(6): 1395–1402.

25.

Kurtz

Henneken

. Open access does not increase citations for research articles from The Astrophysical Journal, https://arxiv.org/ftp/arxiv/papers/0709/0709.0896.pdf

26.

Moed

HF.

The effect of ‘open access’ on citation impact: an analysis of ArXiv’s condensed matter section. J Am Soc Inform Sci Technol 2007; 58(13): 2047–2054.

27.

Hajjem

Harnad

The open access citation advantage: quality advantage or quality bias?, https://arxiv.org/ftp/cs/papers/0701/0701137.pdf (accessed 25 January 2018).

28.

Gargouri

Hajjem

Larivière

, et al. Self-selected or mandated, open access increases citation impact for higher quality research. PLoS ONE 2010; 5(10): e13636.

29.

Koler-Povh

Južnič

Turk

Impact of open access on citation of scholarly publications in the field of civil engineering. Scientometrics 2014; 98(2): 1033–1045.

30.

Donovan

Watson

Osborne

The open access advantage for American law reviews. J Pat Tradem Off Soc 2015; 97: 4.

31.

Xia

Nakanishi

Self-selection and the citation advantage of open access articles. Online Inform Rev 2012; 36(1): 40–51.

32.

Hajjem

Harnad

Citation advantage for OA self-archiving is independent of journal impact factor, article age, and number of co-authors, https://arxiv.org/abs/cs/0701136 (accessed 14 February 2017).

33.

Sotudeh

Estakhr

Sustainability of open access citation advantage: the case of Elsevier’s author-pays hybrid open access journals. Scientometrics 2018; 115(1): 563–576.

34.

Arendt

Peacemaker

Miller

Same question, different world: replicating an open access research impact study. Coll Res Libr 2019; 80(3): 303.

35.

Martín-Martín

Costas

van Leeuwen

, et al. Evidence of open access of scientific publications in Google Scholar: a large-scale analysis. J Inform 2018; 12(3): 819–841.

36.

Johnson

Watkinson

Mabe

The STM report: an overview of scientific and scholarly publishing. The Hague: International Association of Scientific, Technical and Medical Publishers (STM), 2018, https://www.stm-assoc.org/2018_10_04_STM_Report_2018.pdf (accessed 11 April 2019).

37.

Björk

BC.

The open access movement at a crossroads–are the big publishers and academic social media taking over? Learned Publishing, 2016, https://helda.helsinki.fi/dhanken/bitstream/handle/123456789/167487/Bjo_rk2016preprint.docx?sequence=1 (accessed 10 April 2019).

38.

Siler

Haustein

Smith

, et al. Authorial and institutional stratification in open access publishing: the case of global health research. PeerJ 2018; 6: e4269.

39.

Dunn

Abelson

Bourg

, et al. Open access at MIT and beyond: a white paper of the MIT ad hoc task force on open access to MIT’s research, https://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1093&context=scholcom (accessed 10 April 2019).

40.

Gargouri

Lariviere

Gingras

, et al. Testing the finch hypothesis on green OA mandate ineffectiveness, https://arxiv.org/abs/1210.8174 (accessed 10 April 2019).

41.

Agarwal

Durairajanayagam

Tatagari

, et al. Bibliometrics: tracking research impact by selecting the appropriate metrics. Asian J Androl 2016; 18(2): 296–309.

42.

Pyke

GH.

Achieving research excellence and citation success: what’s the point and how do you do it?

Bioscience 2014; 64(2): 90–91.

43.

Pyke

GH.

Struggling scientists: please cite our papers!

Curr Sci 2013, https://opus.lib.uts.edu.au/bitstream/10453/32179/1/2013000927OK.pdf (accessed 14 February 2017).

44.

Falagas

Alexiou

VG.

The top-ten in journal impact factor manipulation. Arch Immunol Ther Exp 2008; 56(4): 223.

45.

Wei

, et al. Do scientists trace hot topics? Sci Rep 2013; 3: 2207.

46.

Neusar

From burgers to tenure: preserving quality amid the choices and dilemmas facing authors of scientific articles. Hum Affairs 2015; 25(3): 327–341.

47.

Peng

Zhu

JJ.

Where you publish matters most: a multilevel analysis of factors affecting citations of internet studies. J Am Soc Inform Sci Technol 2012; 63(9): 1789–1803.

48.

Fanelli

Positive results receive more citations, but only in some disciplines. Scientometrics 2013; 94(2): 701–709.

49.

Marcella

Lockerbie

Bloice

, et al. The effects of the research excellence framework research impact agenda on early- and mid-career researchers in library and information science. J Inform Sci 2017; 44: 608–618.

50.

Rousseau

Being metric-wise: heterogeneity in bibliometric knowledge. El Profes Inform 2017; 26(3): 480–487.

51.

Bordignon

Andro

Impact de l’open access sur les citations: une étude de cas. I2D Inform Données Doc 2016; 53(3): 70–79.

52.

Lin

SK.

Non-open access and its adverse impact on molecules. Molecules 2007; 12(7): 1436–1437.

53.

Zawacki-Richter

Anderson

Tuncay

The growing impact of open access distance education journals: a bibliometric analysis. Int J Dist Educ 2010; 24(3): 1–15.

54.

Atchison

Bull

Will open access get me cited? An analysis of the efficacy of open access publishing in political science. Polit Sci Politics 2015; 48(1): 129–137.

55.

Gentil-Beccot

Mele

Brooks

Citing and reading behaviours in high-energy physics. Scientometrics 2009; 84(2): 345–355.

56.

Henneken

Kurtz

Eichhorn

, et al. Effect of e-printing on citation rates in astronomy and physics, https://arxiv.org/abs/cs/0604061 (accessed 25 January 2018).

57.

Kousha

Abdoli

The citation impact of open access agricultural research: a comparison between OA and non-OA publications. Online Inform Rev 2010; 34(5): 772–785.

58.

Niyazov

Vogel

Price

, et al. Open access meets discoverability: citations to articles posted to Academia.edu. PLoS ONE 2016; 11(2): e0148257.

59.

Snijder

Revisiting an open access monograph experiment: measuring citations and tweets 5 years later. Scientometrics 2016; 109(3): 1855–1875.

60.

Sotudeh

Arabzadeh

Mirzabeigi

. How do self-archiving and Author-pays models associate and contribute to OA citation advantage within hybrid journals. The Journal of Academic Librarianship 2019; 45(4):377–385.

61.

Torres-Salinas

Robinson-García

Aguillo

IF.

Bibliometric and benchmark analysis of gold open access in Spain: big output and little impact. El Profes Inform 2016; 25(1): 17–24.

62.

González-Argote

García-Rivero

Comment on ‘Bibliometric analysis of the Journal of Oral Research. Period 2012-2015’. J Oral Res 2016; 5(6): 224–225.

63.

Pearce

JM.

How to perform a literature review with free and open source software. Pract Assess Res Eval 2018; 23(8): 1–13.

64.

Otto

Mullen

LB.

The Rutgers open access policy goes into effect: faculty reaction and implementation lessons learned. Libr Manage 2019; 40(1–2): 59–73.

65.

Lawlor

An overview of the NFAIS 2018 annual conference: information transformation: open, global, collaborative. Inform Serv Use2018; 38: 1–31.

66.

Hansen

Understanding and making use of academic authors’ open access rights. J Libr Scholar Commun 2012; 1(2): 1–4.

67.

Swan

Open access self-archiving: an introduction. UK FE and HE Funding Councils, 2005, http://cogprints.org/4406/1/jiscsum.pdf (accessed 10 April 2019).

68.

Creaser

Fry

Greenwood

, et al. Authors’ awareness and attitudes toward open access repositories. New Rev Acad Libr 2010; 16(S1): 145–161.

69.

Hagiwara

Ishita

Mizutani

, et al. Identifying key elements of search results for document selection in the digital age: an observational study. In: Proceedings of the international conference on Asian digital libraries, Bangkok, Thailand, 13–15 November 2017, pp. 237–242. Cham: Springer.

70.

Joo

Cahill

Exploring research topics in the field of school librarianship based on text mining. School Libr Worldw 2018; 24(1): 1–27.

71.

Cretchley

Rooney

Gallois

Mapping a 40-year history with Leximancer: themes and concepts in the Journal of Cross-Cultural Psychology. J Cross-Cult Psychol 2010; 41(3): 318–328.

72.

Priem

Piwowar

Hemminger

. Altmetrics in the wild: an exploratory study of impact metrics based on social media. In: Proceedings of the Metrics 2011: Symposium on informetric and scientometric research, New Orleans, LA, 12 October 2011, http://jasonpriem.com/self-archived/PLoS-altmetrics-sigmetrics11-abstract.pdf (accessed 25 January 2018).

73.

Waltman

van Eck

NJ.

A systematic empirical comparison of different approaches for normalizing citation impact indicators. J Inform 2013; 7(4): 833–849.

74.

Ferreira

Reis

Paula

, et al. Structural and longitudinal analysis of the knowledge base on spin-off research. Scientometrics 2017; 112(1): 289–313.

75.

Belter

CW.

Bibliometric indicators: opportunities and limits. J Med Libr Assoc 2015; 103(4): 219–221.

76.

Brouthers

Mudambi

Reeb

DM.

The blockbuster hypothesis: influencing the boundaries of knowledge. Scientometrics 2011; 90(3): 959–982.

77.

Seglen

PO.

Why the impact factor of journals should not be used for evaluating research. BMJ 1997; 314(7079): 498–502.

78.

Oppenheim

Electronic scholarly publishing and open access. J Inform Sci 2008; 34(4): 577–590.

79.

Migheli

Ramello

GB.

Open access journals and academics’ behavior. Econ Inq 2014; 52(4): 1250–1266.

80.

Lawrence

Online or invisible. Nature 2001; 411(6837): 521.

81.

Kurtz

Eichhorn

Accomazzi

, et al. The bibliometric properties of article readership information. J Am Soc Inform Sci Technol 2005; 56(2): 111–128.

82.

Saxena

Thawani

Chakrabarty

, et al. Scientific evaluation of the scholarly publications. J Pharmacol Pharmacother 2013; 4(2): 125–129.

83.

Shanta

Pradhan

Sharma

SD.

Impact factor of a scientific journal: is it a measure of quality of research?

J Med Phys 2013; 38(4): 155–157.

84.

Onodera

Yoshikane

Factors affecting citation rates of research articles. J Assoc Inform Sci Technol 2015; 66(4): 739–764.

85.

Reich

Green

Brock

, et al. Biases in research evaluation: inflated assessment, oversight, or error-type weighting? J Exp Soc Psychol 2007; 43(4): 633–640.

86.

Nakatani

Jatowt

Ohshima

, et al. Quality evaluation of search results by typicality and speciality of terms extracted from Wikipedia. In: Proceedings of the international conference on database systems for advanced applications, Brisbane, QLD, Australia, 20 April 2009, pp. 570–584. Berlin: Springer.