Journal list menu
Measuring the quality of scientific references in Wikipedia: an analysis of more than 115M citations to over 800 000 scientific articles
[Correction added on 29 November 2020, after first online publication: Peer review history is not available for this article, so the peer review history statement has been removed.]
Abstract
Wikipedia is a widely used online reference work which cites hundreds of thousands of scientific articles across its entries. The quality of these citations has not been previously measured, and such measurements have a bearing on the reliability and quality of the scientific portions of this reference work. Using a novel technique, a massive database of qualitatively described citations, and machine learning algorithms, we analyzed 1 923 575 Wikipedia articles which cited a total of 824 298 scientific articles in our database and found that most scientific articles cited by Wikipedia articles are uncited or untested by subsequent studies, and the remainder show a wide variability in contradicting or supporting evidence. Additionally, we analyzed 51 804 643 scientific articles from journals indexed in the Web of Science and found that similarly most were uncited or untested by subsequent studies, while the remainder show a wide variability in contradicting or supporting evidence.
febs15608-note-1001Abbreviations
-
- DOI
-
- Digital Object Identifier
-
- MMR
-
- measles, mumps, and rubella
-
- PMCID
-
- PubMed Central Identification
-
- PMID
-
- PubMed Identification
Introduction
Wikipedia1, the free online encyclopedia, is an integral part of the web and society. With over 18 billion visits per month [[1]], currently ranking it as the 14th most visited website in the world across all languages as of August, 2020 [[2]], it has become the go-to source of information for nearly all aspects of life. It is comprised of over 6M articles and 49M pages, which have received 934M edits from 38M users [[1]]. Because Wikipedia is so important for maintaining a well-informed society, we sought to determine how primary research articles informing Wikipedia articles have been cited within the scientific community.
As of 2018, a total of 824 298 scientific articles with persistent identifiers (a DOI, PMID, or PMCID) were referenced by 1 923 575 English-language Wikipedia articles [[3]], meaning that an estimated 31% (1 923 575/6 125 606) of all Wikipedia articles reference a scientific article [[1, 3]]2. The accuracy of these articles is paramount, especially considering that Wikipedia is often the first and only source of information for some readers. The task of editing articles is delegated to its large community of volunteer editors and users; claims are heavily debated and calls for primary sources of evidence are flagged with the now popular phrase: ‘Citation Needed’.
A limited literature examines how citations are used in Wikipedia articles. For example, among Wikipedia articles on historical topics, there is a relatively small (compared to a leading scholarly journal) ratio of citations to citation statements, indicating that Wikipedia articles rely heavily on a limited number of references [[4]]. Other research shows that the types of papers cited by Wikipedia articles are similar to those cited by scholarly publications (albeit with a slight preference for high-impact articles) [[5]] and that Wikipedia articles appear to be biased in favor of citing articles that are available through open access [[6]].
However, this literature fails to address an important question: just how reliable are the sources cited by Wikipedia articles, particularly with respect to scientific topics? To answer this question, we performed a citation analysis of scientific articles referenced in Wikipedia using ‘Smart Citation’ data from scite. Smart citations provide the context for each citation and a classification describing whether it provides supporting or contradicting evidence for the cited claim. Classifications are performed by a deep learning model that has been trained on 43 665 expert-labeled citation statements with precision scores of 0.800, 0.8519, and 0.9615 for supporting, contradicting, and mentioning classifications, respectively (internal scite benchmarking data). To date, scite has analyzed over 16M full-text scientific articles, extracting over 500M citation statements that cite over 34M articles. These scientific articles were obtained through a variety of means, including retrieval of open access papers, preprints, PubMed Central, and through partnerships with various publishers such as Wiley, the British Medical Journal, and Rockefeller University Press.
Using this information, we analyzed the 824 298 scientific articles referenced in the English Wikipedia to see how they had been cited in the scientific literature. These scientific articles have received 115 046 571 total Smart Citations according to scite. Of those Smart Citations, 3 435 635 (2.99%) indicate that they provide supporting evidence, 401 472 (0.35%) indicate that they provide contradicting evidence, and 111 209 464 (96.7%) mention the citing study without indicating that they provide supporting or contradicting evidence. Wikipedia articles referencing scientific articles cited an average of 2.44 (median = 2), (IQR = 1, SD = 24.09, range = 1–14 106) scientific articles3. This figure differs from a recent estimate likely due to variations in data collection (Arroyo-Machado et al. [[7]] utilized Altmetric data in their analyses, while we used data retrieved directly from Wikipedia). Among scientific articles referenced by Wikipedia articles, the average number of citations was 130.52 (SD = 476.93), the mean number of supporting citations was 3.96 (SD = 9.98), the mean number of contradicting citations was .45 (SD = 1.34), and the mean number of mentioning citations was 126.10 (SD = 470.90) (Table 1).The most cited scientific article referenced in Wikipedia in the scite database4 describes Laemmli buffer, which is widely used in protein analysis and has over 66k citation statements [[8]]. Out of the 824 298 papers cited by Wikipedia articles, most remain untested by other subsequent citing articles: 324 247, (39.34%) are referenced by mentioning citations with no supporting or contradicting citations, 148 243 (17.98%) have no citations at all, 235 036 (28.51%) have been supported with no contradicting evidence, 102 719 (12.46%) have been disputed with both supporting and contradicting evidence, and 14 052 (1.70%) have been contradicted with no supporting cites (Fig. 1).

One hundred and eighteen scientific papers referenced by Wikipedia articles have been retracted5, appearing on 297 Wikipedia articles. To determine if these papers could be easily identified as retracted on Wikipedia, we analyzed 50 random Wikipedia articles referencing a retracted scientific article and found that 25 (50%) did not acknowledge the retraction, 15 (30%) appropriately acknowledged the reference, and 10 (20%) were no longer linked (Table S1). When acknowledging the retraction did occur, it was generally in the text as well as in the reference section; for example, the Wakefield et al. [[9]] paper presenting evidence of a causal link between vaccines and autism was cited on the Lancet MMR autism fraud Wikipedia article as, ‘The paper, authored by Andrew Wakefield and eleven coauthors, claimed to link the MMR vaccine to colitis and autism spectrum disorders. [[10]] Events surrounding the research study and the publication of its findings led to Wakefield being struck off the medical register. The paper was retracted in 2010’.
Because Wikipedia provides such a high level of visibility we looked at 50 random scientific articles with zero citations referenced in Wikipedia to better understand why these articles were not cited or if this was due to the limited coverage of scite. We compared citation counts between scite and Dimensions, a traditional citation index that is openly available. In the Wikipedia subset we found that 8 (16%) scientific articles had 0 citations in both indices, 21 (42%) had citations in Dimensions but not in scite, 12 (24%) DOIs did not resolve, and 9 (18%) were indexed in scite but not Dimensions (Table S2). Interestingly, the DOIs not indexed in Dimensions were almost uniformly descriptions of endangered animals, like the Sunda Pangolin [[11]]. The discrepancy between citation counts in scite and Dimensions can be explained by the difference in approaches between a traditional citation index and a Smart Citation index, whereas scite requires access to the full-text of scientific articles to extract citation statement excerpts, traditional citation indices do not. Moreover, as acknowledged on the scite website, the scite database is still growing (‘We're adding millions of citations to our database each day’.) The number of unresolvable DOIs likely reflects artificial errors from scraping Wikipedia [[3]] and not editorial mistakes as no unresolvable DOIs were identified on Wikipedia articles when checked independently.
To see how scientific articles referenced in Wikipedia compare to the scientific literature as a whole, we looked at the citation breakdown of 51 804 643 articles (429 780 086 total Smart Citations) from journals indexed in the Web of Science. Of these citations, 18 940 149 (4.41%) indicate that they provide supporting evidence, 2 710 605 (0.63%) indicate that they provide contradicting evidence, and 408 129 332 (94.96%) mention the citing study without indicating that they provide supporting or contradicting evidence. Similar to Wikipedia, most articles have either not been tested by other subsequent citing articles [has mentioning cites only; 17 441 574 (33.67%)] or have no citations at all [26 396 010 (50.96%)], while 6 038 194 (11.66%) have been supported with no contradictions, 1 407 829 (2.72%) have been disputed with both supporting and contradicting evidence, and 521 024 (1.01%) have been contradicted with no supporting cites. The average number of citations articles from these journals received was 16.91 (SD = 58.20), the mean number of supporting citations was 0.75 (SD = 2.22), the mean number of contradicting citations was .11 (SD = 0.46), and the mean number of mentioning citations was 16.06 (SD = 56.71) (Table 1). Again, we looked at 50 random scientific articles indexed in the Web of Science with 0 citations and compared numbers from scite to Dimensions. We found that 26 (52%) had 0 citations in both indices, 15 (30%) had citations in Dimensions but not in scite, 1 (2%) did not resolve (due to a defunct DOI or transcription error), and 8 (16%) were indexed in scite but not Dimensions.
Our results should be considered with caution given the limitations of the model precision, the current limited coverage of articles analyzed by scite, and the fact that articles that could not be linked to a DOI in the data set were excluded6.
Beyond technical limitations, it is also important to consider what the citation classifications mean. For example, a contradicting citation statement does not necessarily mean the cited paper is wrong because: (a) scite classifies citation statements at the level of the claim, not the full paper, and (b) the citing article making the contradicting claim itself could be without merit. Nonetheless, these numbers are a good approximation of how the scientific foundations of Wikipedia have been tested in the scientific literature and represent the first time an analysis of the quality of citations, not just the quantity, has been done at this scale. Previous citation analyses at the individual article level have shown that reporting the citation context can be informative for readers [[12, 13]] with one citation analysis [[13]] causing the publisher to add the following warning to the original report [[14]], ‘Editor’s Note (added May 31, 2017): For reasons of public health, readers should be aware that this letter has been “heavily and uncritically cited” as evidence that addiction is rare with opioid therapy’.
To look at how citation context could impact Wikipedia users if it were linked next to scientific references, we examined two articles directly. The Wikipedia article on ‘Amygdala’ states, ‘In 2006, researchers observed hyperactivity in the amygdala when patients were shown threatening faces or confronted with frightening situations. Patients with severe social phobia showed a correlation with increased response in the amygdala’ citing Phan et al. [[15]] as evidence for this statement. According to scite [[16]], this reference has received 259 mentioning citation statements, 23 supporting citation statements, and three contradicting citation statements (Fig. 2). Thus, while some have provided supporting evidence, two studies have called this into question, with one report stating [[17]], ‘These findings do not replicate previous studies…’ The citation context offers a more complete picture, potentially affecting decisions by everyday readers and choices of editors. Consider the Wikipedia article ‘Suicide and Internet’ which features the following statement, ‘A survey has found that suicide-risk individuals who went online for suicide-related purposes, compared with online users who did not, reported greater suicide-risk symptoms, were less likely to seek help and perceived less social support’, highlighting a report by Harris et al. [[18]]. As identified by scite [[19]], this report was later contradicted by a subsequent study finding that suicide-related Internet use individuals were more likely to seek help [[20]] (Fig. 3). Another study [[21]] does provide supporting evidence noting, ‘Our findings supported previous research showing suicide risk was related to greater likelihood of online interpersonal communications’; however, the supporting paper is from the same group as the original paper while the contradicting citation comes from an independent group. Thus, providing contextual citation information for this Wikipedia claim could influence behavioral choices that have potentially life or death consequences for a large population of people.


In conclusion, Wikipedia references scientific articles that are more than twice as likely to be supported than the scientific literature in general; 28.5% of articles referenced in Wikipedia have a supporting citation vs. 11.7% of articles in Web of Science, although most are untested with a minority being contradicted. When Wikipedia articles cite scientific papers that have been subsequently retracted, 50% of the retracted papers could not be identified as retracted within the Wikipedia article itself7. When the retraction was explicitly mentioned in the Wikipedia article, it was typically in service of a larger conversation about the retracted paper itself. Our analysis highlights the fact that references alone fail to capture the full story of the scientific claim, citation counts help indicate impact but also are limited in understanding how a claim has been received by subsequent research. We argue that exposing the extent to which a scientific paper has been supported or disputed in subsequent publications to editors and moderators of Wikipedia is critical to ensuring the encyclopedia is a reliable source of information. Simply put, the adage ‘Citation Needed’ is not enough. References in Wikipedia as well as scientific articles themselves should display citation contexts.
Classification (N) | Wikipedia | Web of science | ||
---|---|---|---|---|
Mean (SD) | Median (IQR) | Mean (SD) | Median (IQR) | |
Total citations (65 738) | 130.52 (476.93) | 37 (98) | 16.91 (58.20) | 6 (14) |
Mentioning citations (65 641) | 126.10 (470.90) | 35 (94) | 16.06 (56.72) | 6 (13) |
Supporting citations (563) | 3.96 (9.98) | 1 (4) | 0.75 (2.22) | 0 (1) |
Contradicting citations (76) | 0.45 (1.34) | 0 (0) | 0.11 (0.46) | 0 (0) |
Materials and methods
Identification research articles in Wikipedia
We used data previously scraped from the English version of Wikipedia [[3]] containing a list of citations with their identifiers from Wikipedia content dumps published on March 1, 2018. The data fields included the id and type, such as ‘pmid’ for PubMed ID, ‘pmcid’ for PubMed Central ID, and ‘doi’ for Digital Object Identifier. First, we mapped all identifiers to DOIs using mapping data from the PMC metadata database (https://www.ncbi.nlm.nih.gov/pmc/pmctopmid/), which provides links between PMIDs, PMCIDs, and DOIs. Mapped DOIs were combined with DOIs where the identifier type was designated ‘doi’ and were considered valid if the DOI existed in a dataset of all known DOIs provided by CrossRef. Within the scraped data, 96% of entries were successfully linked to a DOI, and among those DOIs, 98% were valid (the remaining percent being unresolvable to a known DOI, presumably due to transcription errors). Given a valid DOI, it was possible to query against our internal citation data to determine how frequently it was cited, supported, mentioned, or contradicted.
Citation analysis
Citation analyses were performed by querying internal scite citation data. While the scite classification model is proprietary, it is based on an open-source deep learning classifier (https://github.com/kermitt2/delft) using Keras, TensorFlow, a SciBERT model and meta-classifier. The corpus on which the model was trained included 43 665 citation statements classified by trained annotators with experience in a variety of scientific fields.
Descriptive analyses and graph generation were performed in R. All queries and code can be found at https://github.com/scitedotai/research-wikipedia.
Acknowledgements
The present work has been supported by the U.S. Department of Health and Human Services, National Institutes of Health, National Institute on Drug Abuse grant 1R44DA050155-01.
Conflict of interest
The authors are shareholders and/or consultants or employees of Scite Inc.
Author contributions
JMN involved in conception and design, acquisition of data, analysis and interpretation of data, and drafting or revising the article. AU involved in acquisition of data, analysis and interpretation of data, and drafting or revising the article. MS involved in analysis and interpretation of data. PG and MM involved in analysis and interpretation of data, and drafting or revising the article. SCR involved in acquisition of data, analysis and interpretation of data, and drafting or revising the article.