I’m not a big jewelry wearer – a lab coat cape’s as accessory-ish as I normally go… But I LOVE amino acid “charm bracelets” – aka PROTEINS! Just like you can stick together different numbers of different charms in different orders to get endless possible charm bracelets, you can stick together (through peptide bonds linking their generic backbones) different amino acids (protein letters) to form endless possible proteins, whose folded shape and properties stem in large part from the properties of the charms you choose. From structural roles like keeping cells from collapsing and acting as scaffolds to keep other molecules close together – to reaction mediating roles, like helping break down sugar for energy or building up sugar from smaller molecules, proteins are our bodies’ main workers, and today I want to honor & pay tribute to the amino acids that have made life as we know it possible – this past year – and for the past millions of years!
note: refreshed & video added 12/24/21
Over the past few weeks, I’ve gone one by one through all 20 of the common protein letters (and a couple not-so-common ones) & their unique charms, and today, to cap off 2020, I want to recap some of the key takeaways, compare and contrast, and wrap up this year’s #20DaysOfAminoAcids. So here goes! note – I’m going to try to briefly explain terminology as I go, but if you want more information, check out the links.
First off, what is an amino acid anyway? Like many (but not all) things in science, hints are in the name. At the core is the central (alpha) carbon (Cα) – which is hooked up to the unique charm (side chain or “R group”) which gives different amino acids their special “superpowers” and the generic parts that allow for linking and make it an amino acid – “Amino” refers to them having an “amine” group – a nitrogen (N) hooked up to hydrogen(s) (H) and/or carbon(s) (C). And “acid” refers to them having a carboxylic acid group – a C double-bonded to an oxygen (O) and also bonded to a hydroxyl (-OH) group (so (-(C=O)-OH). An “acid” (in one definition) is something that donates a proton (an H⁺), and a carboxylic acid can donate a proton from the hydroxyl group to give you a carboxylate anion (-(C=O)-O⁻). The neutral form of the amine group (-NH₂) can act as a base (proton taker) to become the cationic -NH₃⁺.
Note: an “ion” is just a name for any type of charged particle – we call something an “anion” if it has a negative charge (which comes from having more electrons than protons) and we call something a “cation” if it has a positive charge (comes from having more protons than electrons). Protons and electrons are oppositely-charged “subatomic particles” that make up atoms and we’ll talk more about them later.
Which protonation state these end groups are in depends on the pH (a measure of how many free protons are floating around). At physiological (bodily) pH (~7.4) the “zwitterionic” form is the most common – I love this word and it just means that you have a positive charged group (like -NH₃⁺) and a negative charged group (-(C=O)-O⁻) in the same molecule, so the charges cancel out to give you a neutral molecule overall. Sorry if this is too technical for your hearts’ desires, but I don’t want you to get thrown off if you see amino acids written and/or drawn out with different protonation states. I will tell you more about the subatomic basis of those charges in a minute, and better explain some of the terminology, but turns out that, once amino acids link up, you only have 2 such ends to worry about…
“Amino acids” are the “free-floating” forms of protein letters – when they link together, they do so by joining the carbonyl (C=O) carbon of one amino acid to the nitrogen of the amino group of the other amino acid, physically losing the equivalent of water (2 H & 1 O) in the process – and also “functionally” losing the free amino and carboxyl groups that are now merged into a peptide bond. As a result, when it comes to the generic peptide backbone, only the “first” amino acid will have a free amino group (we call this the “N terminus” and only the “last” amino acid will have a free carboxylic acid group (we call this the “C terminus”) (though as we’ll see, some of the unique side chains have extra ones that aren’t [affected by these linkages).
So, instead of calling them “amino acids,” once they’re linked together and no longer “amino acids” we call the “residuals” of what used to be individual amino acids “residues” (sorry if this is too technical, but this confused me for the longest time when I was an undergrad and I was embarrassed to ask! (speaking of which – never be embarrassed to ask questions – not only does it hold you back intellectually, it’s way more embarrassing to find out years later you’ve got it all wrong).
It’s not just water that’s lost when peptide bonds form – freedom to rotate is also greatly restricted – even more than usual… You always lose some freedom of movement whenever you link up atoms (think of walking alone versus walking bound to a partner in a 3-legged race). But normally, bound atoms can still rotate pretty freely, as long as they don’t physically bump into another atom or invade its personal space (we call such “2 atoms can’t be in the same place at the same time” limitation “steric hindrance” and, as we’ll see when we get into the individual amino acids, the bigger & bulkier the “charm” the more the hindrance and the more restricted the rotation).
However, in a peptide bond, such rotational freedom is limited by more than just steric hindrance because the peptide bond has “partial double bond character” thanks to resonance (aka electron delocalization). Now that I’ve spit out that jargon, let me step back and try to explain what it means…
Each of those C’s and H’s and O’s & N’s I’ve been talking about represent individual atoms. Atoms are the basic units of elements and they’re made up of smaller parts – protons, neutrons, & electrons – collectively called subatomic particles. The number of protons (positively-charged) is fixed for a given element (e.g. carbon has 6 and nitrogen has 7) – and so is their location – they’re stuck in a dense central core of an atom called the “atomic nucleus” (where they hang out with neutral neutrons). But the number and location of electrons can vary because they whizz around the nucleus in diffuse “electron clouds” and atoms can give, take, and share them to meet their desires. It is this sharing that forms the basis of the strong covalent bonds that link together the atoms in molecules (everything from individual waters, single amino acids, giant proteins, and DNA).
Atoms can share 1 pair of electrons to form a single bond, 2 pairs for a double, or 3 pairs for a triple. The more they share, the stronger & shorter the bond. Sometimes, groups of atoms have more than enough electrons for single bonds, but not enough for double bonds, so they can share their “extra” electrons in a sort of “electron commune” where extra electrons are shared evenly among multiple (more than 2) atoms, leading to the functional equivalent of something in between a single and a double bond. We call this “electron delocalization” or “resonance” or “conjugation.” It’s really stabilizing (so it makes the atoms happy) but it requires all of the atoms involved to lie in a plane, so it greatly restricts movement.
Peptide bonds have such resonance between the N, the carbonyl C, and the O it’s bound to, so those 3 atoms have to lie in the same plane. As a result, rotation along the backbone is limited to specific locations giving you a sort of “chain or planes.” The rotation places are on either side of the central carbon (Cα) which is also where the unique side groups stick off of – so the next limit of rotation is that steric hindrance thing – so let’s talk size!
Yep, now that we’ve talked about what makes all the amino acids similar, it’s finally time to get into what makes different amino acids different (and hence what makes different proteins different). There are 20 common genetically-encoded amino acids (this just means that they are “spelled out” in protein recipes (genes and their messenger RNA (mRNA) copies) as 3-nucleotide (DNA/RNA letter) code words called “codons” which tell the protein-making equipment (ribosomes) what amino acids to link together in what order. There are 4 nucleotides, so 64 possible codons – so some amino acids have multiple spellings (there’s some redundancy) but one codon will only ever spell one amino acid (no degeneracy). Though we’ve also looked at a couple of weirdo cases where the amino acids selenocysteine and pyrrolysine can sneak in at a stop sign… And speaking of stop signs, there are 3 stop codons that tell the ribosome to stop, release the finished polypeptide (which can finish folding up) and do it all again.
If you want to learn even more about the individual amino acids, through #20DaysOfAminoAcids I’ve done a (probably too-long) post on each of them that tells you more about their superpowers, history, etc.
Each amino acid has a “full name,” a short (3-letter) “nickname,” and a 1-letter “initial.” So, for example, the smallest amino acid (which just has a hydrogen as its “charm” (side chain aka “R group”) has the full name “Glycine,” the nickname “Gly,” and the initial “G.” And the biggest amino acid (whose charm is a double-ringed thing called an “indole”) has the full name “Tryptophan,” the nickname “Trp” and the initial “W” – yeah, they had to get creative for a few of them that start with the same letter as other ones. Priority in initializing was given to the more common ones – so threonine gets “T” and Tryptophan is stuck with “W” – but you can remember it by saying it like Tweety bird – tWyptophan!. Since the one letter abbreviations can be twicky, I’m going to mainly use the 3-letter ones (though the 1 letter ones are convenient when looking at the amino acid sequences of huge proteins).
As you’d expect based on the size of their charms, Gly is really flexible, whereas Trp – well, not so much… Also severely hindered on the steric front are the other amino acids with big bulky rings – Phenylalanine (Phe, F) and Tyrosine (Tyr, Y). These 3 amino acids are classified as “aromatic” because they have rings that are resonance-stabilized (similar to like what we saw in the peptide bond except that here the electrons are shared among atoms hooked up in a ring). Histidine (His, H) also has an aromatic ring, but it’s usually categorized with the “positively-charged” or “basic” amino acids (more later). All the others – anything that’s *not* aromatic – are called “aliphatic.”
The aliphatic amino acids have side chains based on chains of carbon, with hydrogen as “filler” (so hydrocarbon chains) and some have unique “functional groups” – basically anything other than H attached to a C – these “other things” (carboxylic acid groups, amine groups, hydroxyl groups, etc.) are more reactive so they allow these chains to serve unique “functions.”
The hydrocarbon chains can be straight or branched – and the closer the branch point is to the backbone, the more the steric hindrance. So, although Valine (Val, V), Isoleucine (Ile, I), and Leucine (Leu, L) might not look that big, these Branched-Chain Amino Acids (BCAAs) make rotation difficult.
Speaking of “not that big in size but still seriously sterically stuck,” the most restricted of them all is Proline (Pro, P), whose 3C chain turns back on itself, hooking back up to the backbone N – can you say awkward?!
Such structural limitations play a role in how proteins fold – chains of amino acids are called polypeptides, and these polypeptide chains can fold up into a few common “motifs” including alpha helices and beta strands that optimize backbone to backbone interactions (we call this “secondary structure”). Depending on their flexibility, they’re more or less likely to be found in different motifs. So, for example, Alanine (Ala, A) with the 2nd-smallest side chain (just a methyl (-CH₃) group) is commonly found in alpha helices, whereas Pro is not.
But steric hindrance isn’t the only thing determining where in a protein you’ll find the different amino acid residues. Even if it can be in a helix, is it out on the surface of a protein, near water? Or tucked tight in the protein’s central core? A major determinant of that is “polarity” and “hydrophobicity.”
Earlier we talked about how atoms – like those making up all of these different charms we’re discussing, link together through strong covalent bonds where they share pairs of electrons. The electrons in a covalent bond can be shared “fairly” or “unfairly” – fair sharing occurs when the partners have similar electronegativities (electron-hoginness), such as carbon and hydrogen, and it leads to an even charge distribution (non-polar). “Unfair” sharing happens when one of the sharers is more electron-hogging (electronegative) than the other – it’ll pull more of the shared electrons toward itself, leading to a partial charge imbalance we call polarity.
You often see such “polar covalent bonds” between oxygen or nitrogen (which are both highly electronegative) and carbon or hydrogen – the O or N steals more than its fair share, leading it to be partly negative and leaving its bonding partner partly positive. Opposite charges attract – even partial ones – so the partly positive parts of polar molecules like to hang out with the partly negative parts of other polar molecules (or other fully charged things).
Water is highly polar, so water molecules really like to hang out together. Thus, if you want water to hang out with something other than water you want that thing to be more attractive to a water molecule than another water molecule. If the water likes it (which happens if the thing is highly polar or charged), it’ll “dissolve” (get a full water coat) – we call such water-loving/water-loved things hydrophilic. Otherwise, the water will just “exclude” the thing from its network, leaving the excluded things to group together to make their surface area as small and hidden as possible. We call this the “hydrophobic effect” and it’s the main force behind protein folding – nonpolar amino acid residues are “hydrophobic” because they don’t have even partial charges to offer – so they fold up so that they’re in the protein’s interior, or at least facing away from the water.
The hydrophobic amino acids include those aromatic ones – Phe, Trp, & Tyr (which has some polarity but the nonpolar part dominates)- as well as some aliphatic ones – Ala, Ile, Leu, Val, & Methionine (Met, M) (note – Met has the additional superpower of sharing a codon with the start codon, so proteins get made starting with it, but it sometimes gets removed after the fact. note: if that codon shows up again before a stop codon, Met gets added just like any other amino acid would.
There are also some aliphatic amino acids that are polar but neutral. These include Cysteine (Cys, C), Serine (Ser, S), Threonine (Thr, T), Asparagine (Asn, N), and Glutamine (Gln, Q).
Asn & Gln have side chains that end in an “amide” group – a C double bonded to an O and also attached to an amine group (so -(C=O)-N₂) – Asn & Gln differ in that Gln’s hydrocarbon linker is one C longer. There are also versions of those chains that end in carboxylate groups – Aspartate (Asp, D) and Glutamate (Glu, E) respectively. Just like we talked about with the carboxyl groups of the generic part of free amino acids, these can give & take protons depending on the pH. It’s all context-dependent, but at bodily pH, they tend to be in the deprotonated, and thus negatively-charged carboxylate state (-(C=O)-O⁻ ) but they can also exist in the carboxylic acid state -(C=O)-OH. In that state, Asp is called Aspartic Acid and Glu is called Glutamic Acid. And Asp & Glu are sometimes referred to as “acidic amino acids.” Charged or not, they’re highly hydrophilic, so don’t be surprised to find them on the surface (or in the active site of enzymes where they can help catalyze (speed-up) reactions).
So Asp & Glu can be negatively-charged. But there are also amino acid residues that can be positively-charged, and we call these “basic” because, in their neutral form, they act as bases (proton-takers). There are 3 of these “basic” amino acids- Arginine (Arg, R), Histidine (His, H), and Lysine (Lys, K). As we talked about before, His has an aromatic ring (in this case a 4-membered (2 being Ns) “imidazole” ring), but Arg & Lys are aliphatic. And they get their positive charge from amine groups that act as bases and take protons. His is the least basic of the 3 – it is less willing to take a proton, so you’ll often find it in the neutral form as well. But Arg & Lys are more basic, so they’ll typically be positively charged in your body (all context-dependent though as its the “local pH” that matters)
Arg is more basic than Lys, but Lys is cool for another reason. It has the superpower of Schiff base forming. If you want to learn more about this, check out the Lys post, but basically Lys can use that N in its side chain to form covalent linkages to other molecules – this allows it to hold onto reaction intermediates in enzymes and latch onto the little protein ubiquitin to target a protein for degradation. http://bit.ly/lysineanalysis
The other side chain capable of forming covalent linkages (as opposed to just charge-based attractions) is Cys. Its side chain is -CH₂-SH. That “-SH” is called a thiol – it’s the sulfur version of an alcohol (something with an -OH) and it can form “disulfide bridges” or “crosslinks” with other Cys residues, so protein-SH + protein-SH -> protein-S-S-protein. This can be used to reinforce a protein’s fold, hold individual monomers (single chains) of a protein together, or act as an antioxidant, absorbing highly-energetic electrons from reactive oxygen species before they can cause damage. Check out the Cys post for more. http://bit.ly/cysteinecrosslinks
The only other amino acid to contain sulfur is Met, but it has that S in its chain (-CH₂-CH₂-S-CH₃)., not at the end of it, so it can’t form crosslinks. And neither can the alcoholic version of Cys (hard Cys?) – Serine (Ser, S) (-CH₂-OH). But Ser, and the other 2 alcoholic amino acids, Threonine (Thr, T) & Tyrosine (Tyr, Y) have a different superpower – they can get phosphorylated.
Phosphorylation is a post-translational modification (happens after a protein is made (translated) that involves adding on a phosphate group (a phosphorus surrounded by 4 oxygens) – these groups are negatively-charged and bulky so they can affect how a protein acts and interacts with other molecules. So phosphorylation is often used to activate/inactivate enzymes such as the kinases (phosphate-adders) in signaling pathways to give you a chain reaction. More here: http://bit.ly/kinases
Other post-translational modifications include hydroxylation (addition of -OH) (such as in the hydroxyproline in the protein collagen); methylation and acetylation (such as happens to lysine residues in the tails of the histone protein DNA wraps up around – making wrapped regions more or less accessible); and glycosylation (addition of sugar chains, which frequently happens to asparagine).
Unlike those post-translational modifications, which happen after the amino acids are added by the ribosome to a growing chain and thus aren’t actually genetically encoded, there are 2 other “uncommon” amino acids that *are* coded for – Selenocysteine (Sec, U) and Pyrrolysine (Pyr). Sec is the selenium-containing version of Cys (so -CH₂-CH₂-SeH) and pyrrolysine is a weird thing you get by joining up a couple lysines (look at it). Pyr was only discovered in 2002, and it’s only found in some bacteria and archaea. more here: http://bit.ly/pyrrolysine But Sec is found in us – and it plays important roles in several proteins. More here: http://bit.ly/selenocysteinetrickery Instead of having their own codons, they sneak in at (specific) stop codons – in the case of Sec, it uses helper proteins that bind to a loopy region in the few proteins that use it – this keeps it nearby and able to sneak in at the stop codon UGA before a termination factor does.
So we’ve looked at big/small; aliphatic/aromatic; hydrophobic/hydrophilic; neutral/charged; modified/unmodified; common/uncommon… And there are *still* more ways we can classify these guys! 2 more, and then I’m stopping!
One you might have heard in the context of diet fads is glucogenic/ketogenic. This refers to whether an amino acid can be “recycled” into parts that can be used to make glucose (blood sugar) through gluconeogenesis (new sugar making) and/or ketone bodies. “Ketone bodies” include acetoacetate, beta-hydroxybutyrate, & acetone. They’re better known as products of fat breakdown (and in the reverse process they can be used to make fats). But they can also come about from the breakdown of amino acids – but not all of them!
Some amino acids are one or the other, some are both – it all depends on the original structure of the amino acids (our starting material) and what enzymes our bodies have for processing them.
ketogenic only: leucine, lysine
glucogenic only: alanine, cysteine, glycine, threonine, serine, asparagine, aspartate, methionine, valine, glutamate, glutamine, proline, histidine, arginine
both: tryptophan, isoleucine, phenylalanine, tyrosine
Another classification scheme is essential/non-essential. *All* amino acids are essential in the sense that our bodies need them to make proteins and function. But some amino acids are also essential in the dietary sense meaning that we can’t make them ourselves – we need to get them “premade” in our diet. These 9 “essential amino acids” include the branched-chain amino acids (BCAAs) Ile, Leu, & Val – as well as Lys, Met, Phe, Thr, Trp, & His (considered semi-essential – essential for infants)
Each amino acid has a much bigger story to tell, but here I don’t have room or time to tell them well, so check out https://bit.ly/aminoacidsposts for links to more.