Huntington’s disease (HD) is a devastating and ultimately fatal neurodegenerative disease that causes progressive motor, cognitive, and psychological deterioration. It is a rare disease (~30,000 Americans known to be afflicted) but it is more well-known than some other rare diseases for a couple reasons – one is that the folk singer Woody Guthrie suffered from it (leading his wife Marjorie to found what is today the Huntington’s Disease Society of America (HDSA). And another is that it is often taught in biology classes as a key example of autosomal dominant genetic diseases. It also is devastating proof of the power of today’s amino acid, glutamine (Gln, Q) when proteins use it in excess – in HD a “trinucleotide repeat expansion” causes too many Gln’s to be put in the huntingtin protein, leading to toxic protein being made. So today I want to tell you a bit about genetic diseases and mutation types and then more about glutamine in health and disease.
It’s Day 18 of #20DaysOfAminoAcids – the bumbling biochemist’s version of an advent calendar. Amino acids are the building blocks of proteins. There are 20 (common) ones, each with a generic backbone to allow for linking up through peptide bonds to form chains (polypeptides) that fold up into functional proteins, as well as unique side chains (aka “R groups” that stick off like charms from a charm bracelet). Each day I’m going to bring you the story of one of these “charms” – what we know about it and how we know about it, where it comes from, where it goes, and outstanding questions nobody knows.
The “recipes” for making proteins (and functional RNAs) are written in stretches of DNA called genes, which are part of much longer strands of DNA called chromosomes (the “cookbooks”). Humans have 23 cookbook volumes, with 2 copies of each volume; 22 “autosomal” chromosomes that you get one copy each from biological mom & dad; and the sex chromosomes where you get either an X or a Y from dad and an X from mom. So, except for genes on the sex chromosomes, you get 2 copies of each recipe.
A lot of the time, this means you get 2 chances to get things right – even if there’s a problem with one copy of a gene recipe (one allele) the other one copy is able to compensate, so you don’t notice anything’s amiss (in such cases, we say people with one disease copy & no symptoms are “carriers”). It’s only when your backup fails – i.e. you have 2 faulty copies – that problems arise. We call diseases like this autosomal recessive and a couple classic examples are cystic fibrosis and Tay-Sachs disease. Barring spontaneous mutations, in order for someone to have symptomatic disease, both parents have to at least carry the gene – a carrier has a 50/50 chance of passing down the faulty gene, and a person with the disease can only pass down a faulty gene.
Sometimes, however, one good copy isn’t good enough – when a single copy of a non-sex-chromosome gene causes a disease we call the disease autosomal dominant, and a biological child of an afflicted person has a 50/50 chance of inheriting the disease. A “classic” example of autosomal dominant diseases is Huntington’s disease.
There are a couple of reasons a single copy can cause a disease. One is “haploinsufficiency” – basically the single good copy can’t keep up with demand. And another is a “dominant negative” effect – the bad copy itself is causing problems – it’s not just nonfunctional, it has some “gain of function” that lets it do things it wouldn’t normally do – like bind different proteins, “distracting” them from their own jobs – or it hogs up the things the good copy needs without actually being able to use them. Net result – having a bad backup is worse than not having a backup at all. It is this second reason – the toxic gain of function – that is believed to be mostly behind HD.
HD may be a “classical” textbook example of autosomal dominance, but the actual mutation that causes it isn’t what you’d normally think of with regards to a genetic mutation.
When a cell wants to make a protein, it first makes a messenger RNA (mRNA) copy of the DNA recipe (and edits it to remove regulatory regions, etc.) in a process called transcription. Then, in a process called translation, a protein-making complex called the ribosome travels along the mRNA (or the mRNA travels through it) and joins together amino acids based on the sequence of RNA letters it encounters. It reads in non-overlapping words of 3 letters called codons, and it knows what to add because transfer RNAs (tRNAs) with a complementary 3-letter anticodon on one part and the corresponding amino acid hooked onto another part come pass it off to the growing chain while the ribosome holds it in place and facilitates the transfer. Each protein has at least 1 codon that spells it (there’s some redundancy) but a single codon will only spell one thing (there’s no degeneracy). For example, AAA & AAG both spell lysine. But AAA and AAG will always only ever spell lysine. more on this here: http://bit.ly/2lT8jma
Some mutations involve a single letter change – we call these point mutations. Some are “silent mutations” meaning they don’t change the amino acid that’s spelled (like if the G in AAG got swapped to an A, you’d get AAA which still spells lysine). We saw an example of this when we discussed sickle cell anemia, caused by a single DNA letter swap in the gene for the oxygen-carrying protein hemoglobin leading to a single protein letter swap that sticks a hydrophobic (water-avoiding) amino acid out near water, causing it to seek refuge in a hydrophobic patch on a neighboring hemoglobin molecule, leading to chains of hemoglobin forming and clogging up blood vessels. http://bit.ly/33foda8
Other times, however, a mutation changes the amino acid letter, not just the DNA/RNA letter (e.g. if the first A in AAG got swapped to a C – you’d get CAG, which spells glutamine). We call such amino-acid-swapping mutations “missense mutations.” A third option for a point mutation is a “nonsense” mutation – in this case, a mutation turns an amino-acid-spelling codon into a stop codon (UAG, UAA, or UGA), which causes the ribosome to stop making the protein before it reaches the true end of the recipe, leading to truncated partial proteins that might not work well.
Those cases involved single letter swaps – they might cause problems, but they don’t change the “reading frame” – codon words are non-overlapping, so where you start determines your “reading frame” (with 3 options: (e.g. REA DME LIK ETH ISO RTH AT? or R EAD MEL IKE THI SOR THA T? or RE ADM ELI KET HIS ORT HAT ?) and, even if you start in the right frame, if you stick in or remove 1 or 2 letters you can get out of frame.
We call such mutations where you insert or delete DNA letters “indels” and they can cause
REA DME LIK ETH ISO RTH AT?
REA D@M ELI KET HIS ORT HAT ?
REA D@! MEL IKE THI SOR THA T?
REA MEL IKE THI SOR THA T?
REA ELI KET HIS ORT HAT ?
Gaining or losing multiples of 3 DNA/RNA letters can cause you to gain or lose protein letter(s) but you remain in frame. For example, the type of mutation that causes HD is a “trinucleotide repeat expansion” – “tri” for 3, “nucleotide” for DNA/RNA letters, so this means that in these mutations, multiple copies of a 3-letter sequence get added in, so you don’t change your reading frame but, if these letters are part of a protein-coding region of a gene (a part that “spells” for amino acids as opposed to one of the regulatory regions) you’ll end up with extra protein letters. Those protein letters will all be the same letter because you’re repeating the same codon, and the protein parts on either side of this irregular stretch will be okay because you didn’t change the reading frame. But the protein as a whole may be compromised.
In HD, the affected gene is the HTT gene encoding for the huntingtin protein (Htt), which is a big (~3000 amino-acid long, 348kDa) protein with lots of functions including helping transport things within cells, regulate gene expression, etc. and the codon that gets repeated is “CAG” which spells for glutamine (Gln, Q). So you end up with a ton of Qs in a row – a so-called “expanded polyQ” tract. This type of mutation is thought to occur due to slip-ups during DNA replication and/or repair – basically, during copying, the copier slips off and then loses track of where it was because it’s surrounded by a sea of CAG on either side of it, so it adds more. Or when it goes to stitch together broken DNA in that region it ends up adding in even more. If this happens in germline cells (ones that get passed down to children) it can cause children to inherit longer expansions than their parents (this is called genetic anticipation). And if it happens in somatic (non-germline) cells, even though it can’t get passed down to future people, it can get passed down to future cells within the person’s body so it can cause the repeat to expand in certain cell lineages and potentially contribute to disease progress – as I’ll tell you more about later, this was something I researched during a summer fellowship as an undergrad.
Why am I focusing so much on repeat length? Having a polyQ tract isn’t “abnormal” – in fact, the normal huntingtin protein has one – as do lots of other proteins. It’s only when it gets too long that problems arise – people normally have CAG tract lengths of 6-35 repeats in their HTT gene. But above 40 repeats leads to symptom development if the patient lives long enough for symptoms to occur – the age of onset is inversely related to the repeat length (i.e. people with longer repeat lengths tend to develop symptoms earlier in life). Symptoms typically begin between the ages of 30 & 50, with uncontrollable muscle movements (chorea) followed by other neurological symptoms and personality changes.
Before I get too far in, let me start by saying – there is still a LOT that is not known about HD and the more scientists find out, the more complex it seems to be – there are lots of pathways involved, and different ways the faulty protein produces problems. This has led scientists to try to tackle the problem at the front end – using things like antisense oligonucleotides (ASOs) to “hide” the bad recipe of the gene from being transcribed (copied into messenger RNA (mRNA) which can get read into protein) or RNA interference (RNAi) to intercept the faulty mRNA. These strategies prevent the problematic protein from being produced, so let’s look a bit at the protein consequences they aim to avoid.
As I said before, lots of proteins – “normal ones” – have polyQ tracts. Why do proteins have them if they’re so prone to problems? When they’re a reasonable length, they can actually be really useful because of glutamine’s “stickiness” (though it’s this same stickiness that, when in excess, causes problems – you know what they say about too much of a good thing).
Glutamine (Gln, Q) looks a lot like the amino acid we looked at yesterday, Asparagine (Asn, N). Both have amide (-(C=O)-NH₂) groups capping off their side chain, but Gln has a longer linker (2 methylene (CH₂) groups versus Asn’s single one). Whereas Asn, is the amide version of aspartate (Asp, D), Gln is the amide version of glutamate – we haven’t formally covered glutamate yet, but we’ve seen it show up a lot. Speaking of seeing things a lot, Gln is the most abundant free amino acid in blood (it makes up ~20% of the amino acids you see floating around in there). You have so much of it because it’s used for energy, moving ammonia around non-toxically, etc.
The longer linker (compared to Asn) gives Gln more flexibility, in both the side chain & the backbone, which now has less bulkiness near it so it doesn’t have to worry as much about steric hindrance (molecules competing for space). This flexibility, combined w/amide’s properties has important consequences because it makes Gln “sticky.”
The particular amide properties in play here are the ability to form hydrogen bonds (H-bonds), which are a special type of bond between an H attached to an electronegative atom (often N or O) that acts as a DONOR & a lone pair of e⁻ on another electronegative atom (again, often N or O) which acts as an ACCEPTOR. Atoms bond through strong covalent bonds by sharing pairs of electrons and electronegative atoms hog e⁻ in the share, making them partially ➖ & the thing they’re attached to partially ➕. Opposites attract and voila you’ve got yourself an H bond.
Gln is great at forming H-bonds because it has both DONORS (amino groups’ H’s) & ACCEPTORS (carbonyl Os) in both its backbone & its side chain and can thus form sidechain-sidechain, backbone-sidechain, & backbone-backbone interactions as well as H-bonds to water & other molecules. Asn has those too, but, with its shorter linker its harder for the molecular lovers to meet up – but Gln’s flexibility helps donors & acceptors meet.
Poly-Q tracts are usually flexible & don’t take a defined “shape” until they bind something else. This allows them to bind lots of different things & act as molecular “scaffolds.” So, by having a *reasonable-length* poly-Q tract, proteins like huntingtin are able to bring together and regulate lots of different things (and if problems arise they can thus have wide-reaching consequences). These problems can arise when the tract gets too long – instead of staying limber, the region “collapses” & sticks to itself instead of other things. And it can stick to polyQ tracts of another molecule of the protein, causing that one to “misfold” too & then another binds that one etc., leading to AGGREGATION.
It is unclear what role aggregates play (clog things, bind to & sequester other important molecules, cause the rest of protein to misfold, prevent important interactions, etc) Even if the protein is recognized as faulty and targeted for the shredder (the proteasome) it can tie up the proteasome so that other proteins “waiting in line” can build up and cause problems. On the other hand, these aggregates might also play protective roles by decreasing the amount of more toxic forms. There is evidence that the monomeric (non-aggregated) and oligomeric (a few grouped together) but still soluble forms of huntingtin, and smaller fragments that can come from proteolytic cleavage are especially toxic to cells while larger, insoluble “inclusion bodies” might be more protective.
There’s also recent findings that it’s something in the DNA or RNA level that contributes to the disease, not just the protein itself – it seems that the uninterrupted CAG length is the most important – if you have CAA interrupting it (e.g. CAG CAA CAG) the age of onset tends to be later even though CAA spells Gln too. And it might have something to do with somatic expansion, which I’ll talk more about below. Here’s a great HDSA webinar on these findings from James Gusella http://bit.ly/2sCzaGm
When I was in undergrad, I thought I wanted to research neurodegenerative diseases and, long story short, I ended up spending the summer between my junior and senior years researching HD in the lab of Steve Finkbeiner at UCSF/Gladstone Institutes funded by a King fellowship from the Huntington’s Disease Society of America (HDSA). It was an amazing experience, one of the highlights of which was getting to present a poster at the 2016 HDSA national convention, where I met patients and families with the disease, as well as some of the doctors working to find a cure and saw the power of patient advocacy in action. Although I decided to go towards more basic research, I am incredibly grateful for that opportunity and continue to follow HD research. Speaking of which, I highly encourage you to check out HDBuzz – a website run by Drs Ed Wild & Jeff Carroll (whom I had the great pleasure of meeting at the convention) – they break down complex topics into easy-to-understand terms and cut through hype to give you the facts. https://en.hdbuzz.net/
What I was looking into was the reason I’ll probably never forget the codon for glutamine (or at least one of the codons) – CAG (the other one’s CAA). I was trying to count the number of CAG repeats in different cells, looking for evidence of somatic expansion (CAG repeat length increasing over time – which is known to occur in patients) and whether that correlated with cell health . In science jargon-y terms, the title of my project was “Using nanobiopsy and RNA analysis to investigate somatic instability of the CAG repeat in Huntington’s disease induced neurons.” One reason we were interested in this was because, although, in general, the longer the inherited CAG tract, the earlier the individual is likely to start experiencing symptoms, there is a wide range of onset ages for patients with a moderate repeat number and one possible reason for this variability is variability in somatic expansion causing different cells to develop longer tracts and form more toxic protein.
I couldn’t test cells in patients – instead I was doing it all with cell culture using something called induced pluripotent stem cells (iPSCs). Basically they’re cell lines derived from patient cells that have been reprogrammed to act like the type of cell you’re interested in – in this case the specific type of neuron most affected in HD. I let them grow in culture (which is a very high-maintenance task because they’re really sensitive and you have to keep changing what growth factors you put in the food you (gently) squirt on them after suctioning off the old stuff), and, the hope was that over time I’d be able to take samples from the same cells and count the # of CAG repeats to see if it was expanding.
One challenge was to take enough to measure but not enough to kill the cells, so i had to do a lot of troubleshooting using “nanobiopsy” with this cool “nanopipet” that was developed by a team of researchers at UC Santa Cruz led by Nader Pourmand. https://news.ucsc.edu/2014/01/nanobiopsy.html With mentorship from postdoc Gabriela Novak, I took tiny tiny samples and then used RT-qPCR to make lots of copies of the mRNA I sucked out. http://bit.ly/2P1EXy9And then I sent those samples to our collaborator, Vanessa Wheeler’s lab for sequencing to count the # of CAGs. Turns out my part was a lot harder than I thought it would be and, while I wasn’t able to track the same cells over time, all my troubleshooting was not for naught – I helped develop a protocol for doing it and can’t wait to see what more comes from the technique!
HD isn’t the only polyQ repeat disorder – other examples include spinocerebellar ataxias and spinobulbar muscular atrophy, and other diseases are caused by other trinucleotide repeat expansions.
Some final notes:
Glutamine is “non-essential” in the sense that we can make it ourselves – it can be made from glutamate via glutamine synthetase similarly to how asparagine is made from aspartate with the help of asparagine synthetase, but it gets its extra amino group directly from ammonia (allowing it to serve an ammonia-scavenging role).
Glutamine was first isolated in 1883, from sugarbeet juice, by German chemists Schulze and Bosshard. But it proved tricky to show that it came from proteins because it can spontaneously deaminate to glutamate in vitro (outside the body), so it wasn’t clear if it was a legit protein letter. In 1935, Krebs found that guinea pig & rat kidneys could enzymatically synthesize glutamine from glutamate and ammonia and people started to buy that it really is a naturally occurring molecule. As for the naming – it comes from adding an amine to glutamate, which was found in wheat gluten.
how does it measure up?
systematic name: 2-Amino-4-carbamoylbutanoic acid
coded for by: CAG, CAA
chemical formula: C5H10N2O3
molar mass: 146.146 g·mol−1