PyrRolysine – NOT pyro-lysine – it’s lysine with a pyrrole group, not lysine on fire! Though its discovery as the 22th genetically-encoded proteinogenic amino acid (protein letter) set the scientific community aflame in 2002! I got away with not mentioning it before because it isn’t found in humans – or even mice or flies – it’s only found in a few bacteria and archaea, who use it to help break down carbon-y things into methane (which humans can potentially use as an alternative energy source). And they’re able to use this special amino acid by rewriting their genetic code slightly so that this special protein letter replaces one of the protein-making stop signs, UAG (there were 3 already, so they won’t miss one!) This lets them introduce this “unusual” 22nd amino acid into their proteins (and scientists are using this “orthogonal” strategy to introduce unusual amino acids into other proteins in the lab). 

refreshed & video added 12/22/21

Proteins (molecular machines) are made up up of chains of building block “letters” called amino acids that are like charm bracelets. Amino acids have a generic backbone (chain link) that allows any amino acid to connect to any other amino acid as well as a unique side chain “charm” that sticks out. Charms have different chemical properties that allow them to interact in different ways w/one another (important for the protein to fold properly) & w/other molecules (important for intracellular interactions). 

Over the past few weeks, we saw the 20 “usual” amino acid charms. And then I introduced you the the 21st, selenocysteine (Sec) which has a “-CH₂-SeH” charm and, as we’ll talk more about later, it sometimes gets cells to bypass the UGA stop sign. Sec was discovered in the 1970s. And scientists were really surprised. And then in 2002, they were hit again – a 22nd amino acid was discovered! And it was found by scientists studying some weird little organisms called archaea.

Archaea are a weird evolutionary branch that have some things in common with bacteria and other prokaryotes (like being single-celled and not having membrane-bound rooms called organelles (like nuclei for DNA storage/protection & mitochondria for energy production)). But they have other things in common with “eukaryotes” – things like plants, people, pigs, and more that do have such organelles. 

There’s a LOT we don’t know about archaea – in part because we probably haven’t found the vast majority of them – many of them are “extremophiles” – they can live in places that are really hot, really cold, really salty, etc – basically extremophiles are organisms that have been able to specialize and carve out their own niche where no one can (or at least no one wants to) live – no competition, yay! Some archaea are “methanogenic” – methanogens can degrade organic (carbon-based) matter that’s hard for others to breakdown, like lipid breakdown products, into methane (CH₄). But they don’t break down the methane, so humans can use it as an alternative energy source. Methanogenesis makes energy for them – and for us! Even better, they tend to live places other things don’t want to – places with little available oxygen (anaerobic environments) such as marine & freshwater sediments – even in municipal waste digesters. 

So they make attractive study subjects – we’ll discuss more about the discovery later but first I want to tell you about how this amino acid sneaks in there… So scientists were studying some of the protein enzymes that these critters use to mediate & speed up their methane-making – and they found these enzymes had modified amino acids. That part wasn’t that weird. Amino acids are known to be modified – some of them (like serine, threonine, & tyrosine) can get phosphorylated (have negatively-charged phosphate (PO₄) groups added on); others (like arginine) can have sugar chains attached through “glycosylation,” etc. But those modifications are added *after* the fact – after the protein is made (translated) – thus we call them post-translational modifications.

In this case, the modification was the addition of a pyrroline group onto lysine – that’s a 5-sided ring where one of the “corners” is a nitrogen, the other four are carbons, and there’s one double bond (more on this later). The weird part about these pyrrolysines was that these modifications were happening *before* the letter was even added – the protein-making machinery (a protein/RNA complex called the ribosome) was adding it directly to the growing peptide chain (which would fold up to become a functional protein). And to understand why this is *really* weird you have to have a general idea about how translation works.

Basically, the instructions for proteins are written in DNA form as “genes” which are just stretches of DNA letters (nucleotides) in really really long, coiled up, chains of nucleotides called chromosomes, kinda like recipes in a cookbook. Instead of working with the original copies, when a cell wants to make a protein, it first makes messenger RNA (mRNA) copies of the gene that get handed over to ribosomes. RNA is a LOT like DNA – both have a generic sugar-phosphate backbone part that allows for in-chain neighbor linking (RNA’s sugar just has an extra O) and unique nitrogenous bases that allow for inter-chain (or in-chain) non-neighbor linking through complementary base pairing – C to G and A to T (in DNA) or U (in RNA). 

This complementarity comes in key for DNA/RNA copying since if you know the sequence of one strand you can predict and/or make the other strand – and then make another copy of the first strand from the new strand, etc. But it also plays a key function in protein-making through codon-anticodon pairing, which allows you to translate from the nucleic acid language of RNA to the protein language of amino acids.

Protein letters (amino acids) are spelled as 3-letter RNA letter (nucleotide) words called codons (e.g. CAG “spells” the amino acid glutamine, whereas CGG spells arginine). Another type of RNA, transfer RNA (tRNA), “charged” with the corresponding protein letter (i.e. it has an amino acid stuck to it) brings that letter to the ribosome (with the help of a protein called an elongation factor) when a codon complementing the tRNA’s 3-letter anticodon shows up in the ribosome’s “entry-way.” The ribosome has 3 tRNA binding spots – new peptides are brought into the A (Aminoacyl) spot and the growing chain, held by a tRNA in the P (Petptidyl) spot passes the chain off to the new guy. And proteins called elongation factors help push the old tRNA out the E (Exit) spot and get the now-chain-holding tRNA into the P spot, freeing up the A spot again which now now has a new codon exposed. 

The ribosome keeps on doing this, adding one amino acid letter for each 3-RNA letter step (codons are read non-overlappingly) until it reaches a stop codon. There are 4 nucleotide letters, and codons are 3-letters-long, so there are 4^3 = 64 codon possibilities – some amino acids have multiple spellings (eg. CAA also spells glutamine) (though each word only ever spells one thing, so CAA will never spell lysine!), but even with the redundancy, you’re left with a few extra codons, especially since the “start” codon doubles as the codon for the amino acid methionine. 3 of the codons – UAA, UAG, & UGA usually don’t spell any amino acid – instead they serve as “stop codons” that tell the ribosome to “let go”

When a stop codon shows up in the A spot, instead of a tRNA binding, a protein called a release factor, acts as a “fake tRNA” – it comes in and helps use water to break the completed peptide off of the tRNA holding it. These release factors don’t have direct complementary base pairing for super-stickiness (the release factor is a protein so it can only “pretend” to be like tRNA – plus it has to be able to bind to 3 different stop codons, so it couldn’t have one-to-one pairing even if it were legit RNA!) As a result, the release factors can be “beaten to it” by tricky tRNAs.

Yesterday we looked at how the amino acid selenocysteine (Sec) has a tRNA with an anticodon that complements the “stop codon” UGA, so it’s able to fool the ribosome into adding it when it reaches UGA. But this only happens really really rarely. Almost always, the ribosome stops when it gets to UGA – it’s only in a couple dozen mRNAs that Sec can sneak in – because those mRNAs have a special loopy part after the end of the protein-spelling part (in the 3’ untranslated region (UTR)) called the SECIS (selenocysteine insertion element). The SECIS binds a special Sec helper protein called SECIS-binding protein 2 (SBP2) which binds to charged tRNASec and its special elongation factor EFSec and interacts with the ribosome, keeping Sec at the ready to sneak in when the ribosome encounters UGA and stalls (it will only sneak in at UGA, not at other stop codons because the Sec-tRNA has an anticodon that complements UGA) 

A *different* mechanism is used to sneak pyrrolysine in – a less “sneaky” one – instead of having a Pyl being a “sometimes spelling”, it’s turned into an “almost always” spelling – the few organisms that use it have basically rewired themselves so that UAG is mostly just like any normal codon. Some Pyl-using organisms still uses UAG as a stop codon, but they compete. They tend to use UAG as stop really rarely and when it is meant to mean stop it’s usually followed by another nearby (nonambiguous) stop codon just in case. There’s some controversy about the potential role of a downstream enhancer sequence called PYLIS that was proposed to play a similar role to SECIS, but doesn’t seem as important. UAG is sometimes called an “amber” codon and, since tRNAPyl suppresses the stoppage of the ribosome at UAG, it’s called an “amber suppressor.”

That’s how Pyl gets into the protein, but how does Pyl itself get made? Some people argue that Pyl is actually the 21st amino acid – that Sec doesn’t count because, although the modification happens before adding the amino acid to the protein, it happens *after* it’s loaded onto its tRNA carrier – Sec-tRNA is first charged with serine, and then, through a couple steps, Ser’s -OH gets swapped out for Sec’s -SeH. In contrast, Pyl is made *before* it gets loaded onto its corresponding tRNA

A few years ago, scientists made significant progress in figuring out how this happens, proposing a reaction mechanism that involves 3 key enzymes, PylB, PylC, & PylD. 

Reaction mechanisms can sometimes seem overwhelming, but the key is to take things stepwise. When looking at a reaction mechanism I ask myself for each step:

  1. What happened?
  2. Why’d it happen?
  3. How does it help us get to our final product?

Nature doesn’t care about #3, which is why it’s really important to ask #2 – Molecules don’t act with a long-term end goal in mind – they just do what makes them happy in the moment – which is part of the reason you can use the end products of one reaction as “intermediates” for a bunch of other different ones. So what makes a molecule happy? reducing formal charges; getting electrons to the electron hogs (electronegative atoms like oxygen & nitrogen); sharing electrons communally through resonance; letting opposite charges come together – these are a few of their fa-vor-ite things!

Of course, in order to reason out the *why* (e.g. does it give more electrons to an O?) you need to figure out the *what* – Sometimes it’s really obvious like a chain ringing up or a chunk breaking off. But it’s often more subtle – like H’s shifting, single bonds doubling, doubles singling, etc. It can be like one of those “spot the differences” games on the back of cereal boxes. But it’s really important you find them! 

While nature doesn’t care what the final product is, we do – so what is the pyrrolysine goal we’re aiming for? Pyrrolysine has a methylated pyrroline carboxylate linked to the end amino group of lysine (the ε-amino group) through an amide bond. Let’s decode that chemical jargon a bit. Note – when chemists & biochemists use weirdo names like this it’s not that we’re trying to make ourselves sound smart, exclude outsiders, etc. or anything (at least not when most of us use it!) – instead, it’s just a descriptive way to describe a molecule’s chemical makeup. Do you ever have trouble remembering people’s names (maybe old classmates, friends-of-friends, etc.)? Well, there are limitless possible molecules, so if we were to try to give a name like Joe or Jane to each of them and then had to remember who was what – well, that would be a nightmare. So, instead we give “common names” like water to ones that show up a lot and describe the other ones with more formal, systematic names that highlight the “functional groups” (the more reactive parts) and where they’re located in the molecule’s more “blah” hydrocarbon skeleton. 

Methyl is just a carbon hooked up to 3 hydrogens (-CH₃). Carboxylate is something with a -(C=O)-O⁻, and pyrroline? That’s a 5-sided ring where one of the “corners” is a nitrogen, the other four are carbons, and there’s one double bond. An amide linkage is the same type of linkage as a peptide bond – but in a peptide bond you’re joining amino acids through their generic backbones (the carboxylate group of one to the amino (-NH2) of another. Instead, here in pyrrolysine, instead of linking to lysine’s backbone amino group, you’re linking to one its “extra” amino group in its side chain, so it won’t interfere with peptide bond forming later. 

3 genes are required for Pyr making:

  • PylB: the mutator (rearrange lysine so it has a methyl group sticking off it – methylate one lysine to form 3-methylornithine)
  • PylC: the condenser (link 3-methylornithine to another lysine)
  • PylD: the oxidative deaminator (oxidize it with the removal of the amine group to form an uncomfortable molecule that spontaneously ringifies)

It’s hard to explain in words, so hopefully you can follow along with the pics as we go. So we start with plain old lysine (and unlike with the Ser-to-Sec conversion, this lysine is *not* hooked up to a tRNA yet. Instead it’s hanging out with PylB.

First, PylB acts as a “mutase” to rearrange lysine’s carbon skeleton to give you 3-methyl-ornithine. It does this through a strange radical reaction (a radical is a molecule with a lone electron and they’re really reactive). With the help of the cofactor SAM, a bond is broken in a way that leaves 2 part-lysine fragments, one with a extra single electron, which, being super reactive, attacks to form back up, but it attacks the carbon one down, so that, in addition to some stereochemical changes (changes in what sticks out which direction) you end up with a methyl group sticking off. note – we saw SAM before, when it was acting as a methyl donor, but here it’s not donating the methyl, just helping with the rearrangement (it gets reduced and then steals a hydrogen to give you that radical). 

Next, PylC sticks that modified lysine onto a normal lysine “abnormally” – the carboxyl group of the weird one to the end amino group of the normal one through an amide linkage. This weird linking requires some special help – energy money in the form of ATP.

In the final *enzyme-catalyzed* step, the end amine group of the weird one that’s now stuck weirdly to the normal one gets oxidized to an aldehyde with the help of PylD. Oxidation involves losing electrons, in this case, the amino group goes with them (so we call it oxidative deamination) and an oxygen is stuck in its place. This oxygen is double-bonded to a carbon (thus we have a carbonyl) and there’s an H on one side of the carbonyl carbon, so we call it an “aldehyde.” 

That was the last enzyme-catalyzed step, but *not* the last step. We’re not done yet, we just don’t need help with this part. Instead, this molecule spontaneously rings up, the partly-negative amine N attacking the partly-positive carbonyl carbon and ultimately resulting in the O & H’s leaving as water and the “I once was an aldehyde carbonyl” carbon joining to the nitrogen through a double bond to give you a ring  – and not just *any* ring – A pyrroline ring!

In addition to these 3 enzymes needed for Pyl-making, you need 2 genes for Pyl-decoding (getting the RNA to read UAG as “Pyr” instead of “stop”). pylT is the gene for making tRNAPyl (sometimes called tRNACUA) – our carrier to the ribosome and pylS encodes pyrrolysyl-tRNA synthetase, which puts Pyl on tRNAPyl *after* it’s made. But that’s it – just 5 genes (sometimes abbreviated pylTSBCD) – so although it’s only been found to occur *naturally* in a few species, it can be introduced and used “non-naturally” to incorporate Pyl into proteins in other organisms – like E. coli. 

These genes are actually arranged in a row like that in the M. barkeri genome, like a little genetic bundle that makes it convenient for “orthogonal” usage – whenever I hear the word “orthogonal” my mind goes to “orthodontist,” but orthogonal just means that one system can be used at the same as you’re using another system, and each system will act like the other’s not there. So we can use orthogonal tRNA/aminoacycl-tRNA synthetase (aaRS) pairs to introduce “non-canonical” amino acids (NCAAs) into organisms that don’t normally use them. This opens up a lot of potential for designing new proteins and is a hot area of research. (It’s always “March Madness” with biochemists betting on this type of NCAA!)

Speaking of betting – a lot of people were “betting” that Pyl had a lot more in common with Sec than it does… Pyrrolysine discovered by Joseph A. Krzycki in collaboration with Michael Chan. Krzycki and his lab were looking into how methanogens do their methane making. And they identified the methylamine methyltransferases responsible for getting the process going, by removing methyl groups from methyl amines and passing them off to other proteins for further processing. But when they looked at the genes for them, they found they had the stop codon UGA in the protein-coding part. They collaborated with Michael Chan who solved the structure of one of the 3 weird methyltransferases, the MMA methyltransferase, MtmB, and thus they figured out the structure of pyrrolysine (and named it pyrrolysine). 

But they still had the big question of where it came from. They knew that the long chain-y part was coming from lysine, but people originally suspected that the pyrroline part was coming from something other than a second lysine (maybe isoleucine? glutamate? proline? methionine? glutamate? ornithine?) But when they labeled lysine with heavy versions of carbon and nitrogen (forms of these atoms with extra neutrons but that act the same as normal) they found that all the pyrrolysine carbons and nitrogens ended up labeled – telling them that lysine – just lysine – is used as building blocks for it. 

And they found that, unlike what people suspected by analogy to Sec (i.e that modification of the lysine would happen after loading), all the modifying magic for Pyl is pre-charging of the tRNA. (Also unlike Sec, it doesn’t require a special elongation factor to take the charged-tRNA to the ribosome, it makes do with the usual one). 

how does it measure up?

systematic name: N6-{[(2R,3R)-3-methyl-3,4-dihydro-2H-pyrrol-2-yl]carbonyl}-L-lysine
coded for by: UAG – in a few archaea & bacteria
chemical formula: C12H21N3O3
molar mass: 255.313 g/mol

some key papers:

  • Hao B, Gong W, Ferguson TK, James CM, Krzycki JA, Chan MK. A new UAG-encoded residue in the structure of a methanogen methyltransferase. Science. 2002 May 24;296(5572):1462-6. doi: 10.1126/science.1069556. PMID: 12029132. 
  • Srinivasan G, James CM, Krzycki JA. Pyrrolysine encoded by UAG in Archaea: charging of a UAG-decoding specialized tRNA. Science. 2002 May 24;296(5572):1459-62. doi: 10.1126/science.1069588. PMID: 12029131.
  • Longstaff DG, Larue RC, Faust JE, Mahapatra A, Zhang L, Green-Church KB, Krzycki JA. A natural genetic code expansion cassette enables transmissible biosynthesis and genetic encoding of pyrrolysine. Proc Natl Acad Sci U S A. 2007 Jan 16;104(3):1021-6. doi: 10.1073/pnas.0610294104. Epub 2007 Jan 4. PMID: 17204561; PMCID: PMC1783357. 
  • Gaston, M., Zhang, L., Green-Church, K. et al. The complete biosynthesis of the genetically encoded amino acid pyrrolysine from lysine. Nature 471, 647–650 (2011).

a couple reviews on unnatural amino acid incorporation:

  • Wan W, Tharp JM, Liu WR. Pyrrolysyl-tRNA synthetase: an ordinary enzyme but an outstanding genetic code expansion tool. Biochim Biophys Acta. 2014 Jun;1844(6):1059-70. doi: 10.1016/j.bbapap.2014.03.002. Epub 2014 Mar 12. PMID: 24631543; PMCID: PMC4016821. 
  • Alexander R. Nödling, Luke A. Spear, Thomas L. Williams, Louis Y.P. Luk, Yu-Hsuan Tsai; Using genetically incorporated unnatural amino acids to control protein functions in mammalian cells. Essays Biochem 3 July 2019; 63 (2): 237–266. doi:

more on topics mentioned (& others) #365DaysOfScience All (with topics listed) 👉

Leave a Reply

Your email address will not be published.