In science, some things are a marathon, but others are a sprint, and sharing urgent findings is sped up by the preprint! In parallel with rising numbers of identified coronavirus cases around the world, scientists have noticed rising numbers of “preprints” – a term used to refer to papers that are published online without first going through the “peer-review process” whereby an article gets checked over by several respected scientists before getting the go-okay to publish in a journal. This preprint route is much faster, which has been great for sharing up-to-date research findings on the coronavirus as they emerge, but sometimes the speed comes at the expense of sharing shoddy science. So, while preprints can be awesome, it’s super important to be extra critical, and the bumbling biochemist is here to help!

It was easy to tell last week when the IUBMB’s Instagram account got hacked because it started posting random spam. (They now have a new account – @the_iubmb) But it can be hard to tell when science is “bad” – usually this “bad” science is *not* intentional malice like that hacker, but instead involves things like poor methodology and/or taking a bit of weak data and drawing big conclusions from it. Traditionally, when a scientist (or group of scientists) wants to tell people about their work, they write up a paper (frequently referred to as a manuscript) and send it to a journal and then it gets checked by “independent experts” – scientists who are in the same research field but weren’t involved in the work in any way so are (hopefully) impartial. This process can catch some of these problems and then either reject the paper or get the authors to fix them. 

That peer-reviewing doesn’t happen with preprints, so “bad science” can get through. Which is not good. Especially when news outlets catch wind of some potentially amazing finding and report it as if it’s rock solid. Or take a speculation the authors make as if it’s a conclusion. 

This is far from hypothetical – a powerful example was a preprint that noticed that a genetic variant of the virus that emerged in Europe became more common than another variant that emerged in China. Since that genetic variant changed a protein letter in one of the virus’ most important proteins – the Spike protein the virus uses to get into cells – the authors proposed that this mutation might make this variant of the virus more infectious, allowing it to gain prominence. But there were a number of problems with this… viruses mutate all the time, and it almost always doesn’t really change the virus, so “mutation” in and of itself isn’t necessarily scary. And there was no actual evidence that this variant of the virus was any worse. A much simpler explanation for the prevalence of this mutation is something called the “founder effect” – the genetic epidemiological equivalent of “I call shotgun!” If one “version” of a virus gets somewhere first and starts to spread widely before another “version” that first version is likely going to have a higher prevalence. 

The paper was only up for a few days before the authors withdrew it after fellow scientists offered critical peer review on Twitter, etc.  But it was too late to prevent 

overblown headlines about “a deadly second strain” and politicians spreading that narrative even further. And it’s completely understandable that they’d do so. 

As a scientist, I love preprints because I have the privilege of the science education needed to critically read the methodology & results and interpret them with many grains of salt. But most people don’t (and that’s not their fault!) I can’t explain all of the intricate details of every paper for you – there’s no way I can even read them all myself! But I can at least explain to you what preprints are & point to some sources like Twitter (honestly!) which hopefully will help. 

Note: I want to start by telling you more about preprints in terms of what *everyone* should know and care about – and then I will get into some more details about some of the technical aspects which only geeks like me will care about, but of course all are welcome to learn :). 

You know how some people periodically Google themselves (no judgement here!)? Well, I periodically Twitter-search for my research institution, Cold Spring Harbor Laboratory (CSHL). It helps me stay up-to-date on what my colleagues are up (that wasn’t supposed to sound creepy!) and what awesome research they’re publishing. It’s been especially great now that I don’t see them in person as much lately. But sometimes when I search for CSHL I see them get “credited” for some pretty obscure stuff that they didn’t really have anything to do with – the only connection is that CSHL (along with BMJ & Yale) runs the preprint server that those papers were published on, bioRxiv (pronounced “bio archive”) or its “sister server” medRxiv. bioRxiv handles basic science research (what are the genes, proteins, etc. involved in some process, etc.) and pre-clinical stuff whereas medRxiv hosts work that includes actual patients – this includes clinical trial results, case studies, epidemiological findings, etc. They’re both free – for the authors and anyone who wants to access their papers. 

It’s probably a bad time to compare something you like to Facebook… but I think that thinking of preprint servers as “better” nerdier Facebooks can help you understand their role. Unlike Facebook, these servers are non-profit. However, they’re in some ways in a kind of Facebook-like position, where they serve as the platform for broad dissemination of mostly helpful but occasionally potentially-harmful information and need to try to avoid harm while at the same time trying not to limit “free speech” or, in this case, judge how good someone’s work is. So, like Facebook, they have some rules and guidelines (though obviously different): you can’t publish *anything* on preprint servers – there are screeners to check for spam and papers promoting harmful medical practices. 

This last part has gotten especially challenging during the pandemic, as covid-related bad science being broadly disseminated can be catastrophic. I mean – bad science about some obscure tree frog is one thing (still not good but probably less harmful). But bad science about medicine – especially at a time when people are desperate for *anything* can be deadly at worst and time & resource-wasting at best. For instance, some of the papers that spurred on the hydroxychloroquine craze were preprints – including some that were later withdrawn. 

But don’t let that scare you off from preprints! At the same time as some bad work got out, a lot of the most important science on the virus was shared through preprints. About 1/4 of the coronavirus-related articles published in early May were published as preprints: and more and more are posted everyday. These include everything from the structure of the viral proteins, to protocols for rapid testing, to information on treatments that really do show promise. For example, recently, the EU announced that a steroid called dexamethasone cut the death risk of severely ill patients (those who required oxygen, including through a ventilator). And they published the data to back up their claims on a preprint server. You can read it for yourself here: 

These articles might not go through the traditional peer review *before* they get published, but they do end up getting peer-reviewed *after* posting, through less traditional outlets, like Twitter (even if sometimes this is too late to avoid media hype). And I am *completely* serious about Twitter. Twitter-ing scientists have actually been really helpful – for example, their threads raising problems with a paper claiming the coronavirus was lab-made got the paper taken down 2 days later.  

The rumors it spurred before being withdrawn were really bad. So I don’t want to give you the wrong impression when I switch to an excited tone now, but the Twitter peer-review of this article, and lots of others (most of which are good!) is incredibly valuable – for all of us. One of the cool things about this kind of peer review is that instead of a single editor reading the paper in secret in their office you have scientists going back and forth, quoting each others tweet threads. And we get to watch it all! For scientists in the making it can be a really valuable learning experience – and for non-scientists it can sometimes be a really neat way to see how scientists think about things and approach problems. And scientists often give great explainer threads walking you through various aspects of the preprints. 

here are a few nice ones about the dexamethasone paper (credit to Wired for linking to them in their article!): ; 

Speaking of linking, when you look up a paper on one of the preprint servers, in addition to the article, information about its revision history (ease of revisions is another benefit of preprints), etc. you’ll find links to all the instances it’s been Tweeted about or used in the media. 

Sometimes, the articles scientists are critiquing on Twitter are published journal articles. And these Tweet-reviews are really valuable too. Especially since simply being published does NOT mean that a paper is “good” or “reliable.” In academia there’s this huge push for scientists to crank out publications or risk losing their jobs and/or being denied promotion – the so-called “publish or perish” situation. But the most prestigious journals can’t take them all, even the respected but not as “famous” journals can’t take them all. So, enter the predatory journals. These guys see an opportunity to make a profit and take it, basically publishing anything (for a fee) regardless of its scientific worthiness. And this leads to a lot of sketchy science with the official “published” & “peer-reviewed” labels. 

This doesn’t mean that you can only trust those super-prestigious ones like “Nature” & “Cell.” Those have a lot of name recognition, but that doesn’t mean that the science in them is necessarily better. Especially because biases at some traditional journals leading to things like more “popular” scientists and labs having an easier time getting papers accepted by the “top” journals. Often super incredible science is published in some super obscure journals. There are a lot of journals so a lot of them are “obscure” to an outsider, even if they might be better-known in the subfield. It’s just the predatory journals you need to really watch out for. This website keeps track of them to help you out:  

So if you see some headline & you know the name of the journal & it’s on that list, be even more critical-think-y than ever. Preprints kinda automatically get extra scrutiny (or at least they should), but scrutiny should be given to *all* science articles. Problem is, a lot of the time, when the news media reports on some paper they don’t even include a link to the paper. Which I find sooooooo frustrating! 

Here’s a nice guide sheet for journalists covering preprints that I think is valuable for more than just journalists 

Now I’m going to verge into the more technical/geeky aspects of preprints (i.e. mom, you are more then welcome to keep reading, but you probably won’t be interested in the rest…)

Before I get into preprints, I think it would help if I explain “traditional prints.” So, scientists find cool stuff and then the traditional way they share that cool stuff is they write up a paper (manuscript) and send it to a journal. A screener editor person at the journal then takes a look and decides that:

a) it’s not up to snuff (at least in their opinion – this is a subjective decision and should not be taken personally!) or it isn’t right for that particular journal 


b) it has potential

If a), the article gets returned to the author(s) without review (i.e. no other people look at it). 

If b), the screener editor person passes the article on to peer reviewers (typically 3 of them). These reviewers are people in the same general field, so they should be able to judge the paper in the context of what’s already known and be able to tell if the methodology is sound, etc. These “peer reviewers” usually do this as a volunteer service & they can decide that the paper is “perfect” as is and recommend publishing it; they can decide that the paper is not (in their opinion) right for the journal and recommend rejection; or they can decide that there are a few minor loose ends that need to be tied up and recommend that the article be revised and resubmitted. This third outcome is frequent, with editors often asking for additional experiments to be performed to answer some of the outstanding questions and/or validate some of their results. For example, if someone submits qPCR data showing that a certain gene is over-expressed at the level of mRNA, they might be asked to also perform a Western Blot to show that excess protein is actually being made from all those extra protein recipe copies. 

So then, sometimes months later, the editor takes the reviewers’ thoughts into account and decides to reject the paper outright, publish it as is, or require some revisions in order for it to get published. So then there’s another wait period and then if the revised version is accepted, the paper gets published. note: Sometimes it gets published online in advance of being published in print. This is different from a preprint because this is the article post-peer-review. 

So, peer review, as we saw above, is really important – but it can also be slow. And sometimes other scientists really just want to know (the findings)! Enter the preprint. It’s actually nothing “new.” Physicists launched a preprint server (called arXiv) in 1991. But the practice didn’t catch on in biosciences until about a decade later. biorXiv was launched in 2013, with medrXiv getting off the ground just last year, in 2019. 

There are numerous reasons for this. One is that traditional bioscience journals would often view preprints as “prior publication” making an article ineligible for printing in the journal. Kinda like – if you already gave it away for everyone, what benefit do we as a journal gain from publishing it? Especially since that would require us investing time, energy, and resources taking it through a thorough vetting process, formatting and printing it up, etc.   In recent years, however, more and more journals aren’t “disqualifying” preprint-ed articles – and many are actually encouraging them. Sometimes preprints are published well before an author is actually ready to send it to a journal. But other times preprints are submitted simultaneously to a preprint server and to a journal. This way the work gets out there sooner (usually within 48 hours) so people can read and get ideas from it, etc. during the typically long time it takes to get to publication (the average time is about 8 months). 

About 70% of preprints go on to be formally published in journals (although it’s likely to be lower for covid papers because there are so many that report really specific things that people likely won’t care that much about years later).  Usually, the paper-published forms will include some revisions as a result of the peer review process. And these revisions can be important – like having the authors do further experiments to validate a finding using a separate, independent experimental method, or making them be less “bold” in their conclusions. Such “bold” speculations can be one of preprints biggest dangers because scientists can draw their own strong conclusions from weak data and then if people only read the conclusions section…

This brings me to another reason for the slow adoption of preprints in biosciences – concern about bad information getting out which could directly impact public health. This is part of the reason that there is a medRxiv as well as a bioRxiv. Things posted to medRxiv are still not “peer-reviewed” or judged for quality, but they do have to pass more checkpoints – like showing they got institutional review board (IRB) approval for their study (to make sure it’s ethical) and got informed consent from participants where applicable. They also have to not “have the potential to cause harm by changing public behavior.”  It’s easy to weed out things like scientists claiming smoking is good for you or telling people to drink bleach, but it’s a lot harder for things like potential treatments with scant evidence. 

Format-wise, preprints are basically just like “normal” papers (e.g. abstract, methods, results, discussion, conclusion) – and in fact a lot of them will go on to be published as “normal papers.” When they do get published, they might have more data and/or fixed grammatical typos. And they’ll be formatted for publishing! One thing that can be really annoying about preprints is that they’re often submitted as PDFs in an “unformatted” format with double-spaced lines and all the figures at the end. The worst part is when all the figure legends are separated from the figures too! So it goes: main text, references, figure legends, figures. And then, if you’re like me, you end up opening up the paper in 4 windows so that you can simultaneously view all 4 of those. 

Normally, when you’re reading a paper, the in-text citations are hyperlinked with pop-up text if you scroll over them, so you can see what’s being cited and actually link to the cited paper without losing your place. But for preprints it’s often just plain text so you have to scroll to the bottom (or look to your other open window) to find it. And, instead of being right below the figure for easy back-forth-ing, the figure legends are often on a whole separate page.

I’m saying “often” a lot because all preprints are different. Because there aren’t set formatting standards for them, unlike there are for journals. And there aren’t journal editor people laying everything out in an aesthetically pleasing way. Which is *why* most of the preprints are in this bare-bones, unformatted form – it lets them submit it to “any” journal they want more easily (although each journal has its own guidelines about what they want submitted in terms of order of sections, use of abbreviations, etc.).  

Like all “normal” papers, preprints have a DOI (Digital Object Identifier). This DOI will stay connected to this article even if the article gets revised (and medRxiv lets you see the revision history). And then, if the article gets published by some journal, the archive will link you to the article. 

Sometimes, preprints can be confused with open access papers, but these terms refer to different things. “Preprint” refers to what (doesn’t) happen before an article is published, whereas “open access” refers to who can view that article once it’s published. Open access articles are free for anyone to read, as opposed to the traditional science journal model whereby institutions or individuals have to subscribe to the journal or pay for individual article access (or request it through their library). Typically, all preprints are open access, but not all open access articles are preprints. 

A couple examples of open access journals are eLife & PLOS. Traditional (paywalled) journals include Science, Nature, and Cell. There’s been a push recently for more journals to move to an open access format, with proponents arguing that much of the science being reported is funded in large part by public tax money, so it’s not fair that the public has to pay to read it. 

This post is part of my weekly “broadcasts from the bench” for The International Union of Biochemistry and Molecular Biology. Be sure to follow the IUBMB if you’re interested in biochemistry (now on Instagram un-hacked @the_iubmb)! They’re a really great international organization for biochemistry.

If you want to learn more about all sorts of things: #365DaysOfScience All (with topics listed) 👉 

2 Thoughts on “Preprints”

Leave a Reply

Your email address will not be published.