One of the things I’ve been feeling down about lately is that I’ve been waiting for an HDX-MS dataset – and it came! And, after a couple server disconnections and error messages, it even downloaded and unzipped! So now I’m trying to figure out what the heck to do with it in order to figure out what it all means. How do these raw data files and spectral plots tell me about the dynamics, solvent accessibility, and floppy, unstructuredness vs fixed-in-strong-structure-ness of various regions of the protein samples I sent? Don’t worry – I have a helpful and patient expert cross-country Zooming with me to help! So as I learn and work on my data (sorry I can’t give details yet) I thought I’d update and expand a post from last year about the basics of Hydrogen-Deuterium eXchange Mass Spectrometry (HDX-MS).
Hydrogen-deuterium exchange mass spectrometry (HDX-MS) is kinda like giving wetsuit-wearing proteins a bath and seeing where they get wet. You bathe them in heavy water and exposed unstructured regions of the protein get heavier, whereas the structured regions are hidden under the wetsuit so they stay dry. First an overview then some more detail. So here’s the gist…⠀
Proteins are long, folded-up chains of letters called amino acids, which are themselves made up of atoms (individual units of carbon, oxygen, hydrogen, etc.). There are 20 (common, genetically-encoded) amino acids and they have different properties (size, charge, etc.). Different proteins have different combinations of amino acids, so, in order to satisfy these amino acids’ desires, they fold and act differently from one another. Ideal folding usually involves grouping together water-excluded (hydrophobic) regions in the center of the protein, away from water (although water really runs throughout the protein through channels so they can never really escape!), and letting water-loving (hydrophilic) regions stay on the outskirts where they can hang out with the watery solvent (liquid they’re dissolved in). For reasons I’ll get into more later, hydrogens tend to come and go a lot easier than other atoms. Thus, when molecules hang out with water, if they’re not too tied up interacting with other things, they can sometimes swap hydrogens with the water. And if that water is “labeled” this can label “unprotected” regions of the protein.
To understand how, we need to look even smaller than amino acids, and even smaller than the units those amino acids are made up of – atoms. We need to go subatomic!
Atoms are made up of smaller parts called subatomic particles: protons (+ charged), electrons (- charged), & neutrons. Different elements are defined by how many protons they have (e.g. carbon always has 6, oxygen always has 8, and hydrogen always has 1), but the number of electrons & neutrons is more flexible and we call versions of an element with different numbers of neutrons nuclear isotopes. Deuterium (D) is a version of hydrogen (a hydrogen isotope) which has 1 more neutron than “normal hydrogen” so it’s heavier -> when you bathe your protein in D₂O the protein can swap out H for D and this makes the protein heavier. If you then cut the protein up into pieces and weigh those pieces individually you can see where swapping occurred and didn’t occur, telling you about how accessible &/or structured those regions are. It’s often used to see if regions become more or less swappable under different conditions or after adding a binding partner.⠀
If H is held tightly, it won’t get swapped, so it will stay “light.” But if H is in a solvent accessible region (in contact with the liquid its dissolved in) and it’s not tied up with bonds to other things, it will get swapped out with the heavier version, deuterium. So when you then cut the protein up into pieces, the piece that was accessible will be heavier.⠀
The “weighing” is done by mass spectrometry or “mass spec.” It’s a technique that can be used to identify proteins and identify modifications to proteins – all based on how heavy and charged they (or at least parts of them) are. Not going into the technical stuff, the principal is that you use endoproteases (protein scissors) to cut up proteins into little pieces, then you charge those pieces – turn them into ions using electrospray – measure the weight of those pieces and, because different protein letters weigh different amounts, you can figure out what letters are in each piece and then match that up to the letters in protein sequences in a big database. ⠀
It’s kinda like “reverse-redacting” – you know the full text and you’re trying to see what parts of that text are covered (and covered in the sense that those letters were detected – not blacked out :P). As I’m quickly getting all-too-familiar-with (though still sometimes confused by) mass-spec results come out as a series of peaks on an m/z graph, where m is mass (heaviness) and z is charge. Increased accessibility leads to increased deuteration leads to increased mass leads to rightward shift. Decreased accessibility leads to decreased deuteration leads to decreased mass leads to leftward shift.⠀
Different protein letters have different masses because they’re made up of different combinations of elements. All protein letters have a generic backbone (although proline’s is slightly different since it’ side chain kinda curves back to hog the N). But they have different side chains, which have different numbers and arrangements of atoms of carbon, oxygen, nitrogen, and/or sulfur. One thing they all have – hydrogen.⠀
Hydrogen’s often “ignored” – sometimes it’s not even drawn in, its presence is just implied. Because it’s not very reactive. And speaking of activeness – hydrogen has heavy form that are NOT radioactive. ⠀
As I mentioned briefly, atoms are made up of protons (+ charged), electrons (- charged), & neutrons. The electrons get a lot more attention in biochemistry because they’re negatively charged and charge makes molecules want to do things like go towards or flee from other molecules. When oppositely-charged (even partly-charged) regions of molecules come together, they can form weak, non-covalent, bonds including sometimes a type of bond called a hydrogen bond.
The number of electrons can vary and this can lead to an imbalance with the number of protons, causing a molecule to be charged (ionic). For example, a neutral molecule that gains an electron becomes negatively charged (anionic) and if it loses an electron it becomes positively charged (cationic). So, water, if it picks up an H⁺ it will become a hydronium ion (H₃O⁺) and if it gives up a proton it will become a hydroxide ion (OH⁻).
note: we often refer to H⁺ as a proton since hydrogen only has one proton, so if it loses an electron you’re just left with a proton (and neutron(s)). It can get somewhat confusing…
Electrons also get more attention because they’re the part of atoms that atoms share to form covalent bonds (the strong bonds that link together adjacent atoms in molecules). ⠀
Neutrons, on the other hand, are neutral, and they’re in the atom’s central nucleus, too far away to interact with other atoms. So normally we don’t think about them much – they’re just kinda there in the background. ⠀
But the number of neutrons can vary without changing the identity of the atom, and we call these different version “isotopes.” So, an atom with 1 proton is always hydrogen no matter how many electrons or neutrons it has. Though, of course, atoms can only hold a certain number of these. When atoms have more neutrons than they can handle, they’re radioactive & can decay to a less neutron-y state, letting of radiation in the process. And we can take advantage of this to radiolabel things like RNA to track it. more here: http://bit.ly/2VtYSG7 ⠀
But not all heavier atoms are radioactive. Hydrogen can hold 2 neutrons. The “normal hydrogen” actually doesn’t have any neutrons – just a proton and an electron. We call this form protium
- add an electron and you get a hydride ion (OH⁻)⠀
- remove an electron and you get a proton (H⁺), which normally hangs out with water as a hydronium ion (H₃O⁺)⠀
- add a neutron and you get deuterium⠀
- add two neutrons & you get tritium, which IS radioactive⠀
Unlike radiolabeling, where we use radioactive isotopes, deuterium isn’t radioactive – it’s stable, just “different” from normal H. So deuterated water is heavy but not “hot” (slang for radioactive)⠀
I remember in biochemistry & chemistry classes H’s would just seem to come & go out of nowhere in equations & mechanisms and it drove me crazy. But turns out hydrogen really does come and go quite readily – and frequently – if it’s attached to the “right things” – hydrogen is constantly being exchanged and we can take advantage of this to see where exchange is occurring and more significantly where it is NOT occurring.⠀
Water can exist as H₂O or H⁺ and OH⁻ and that H⁺ usually grabs on to another H₂O to give you H₃O⁺ (hydronium ion). So you have an OH⁻ able to take an H & H₂O & H₃O⁺ willing to give an H. They can give and take from each other (other water molecules) or they can give and take H’s from other things. ⠀
Same goes for deuterated water – it acts the same as normal water because the other molecules “ignore the neutrons” as well. So, D₂O gives you D⁺ and DO⁻. And that DO⁻ can pull off the normal H, allowing it to get swapped out. But the DO⁻ has to find that H to pull off, so it has to be solvent-accessible, and “unoccupied.” And, in order for us to be able to detect it it can’t be sooo swappable that it swaps back when we do the post-labeling stuff, which uses normal water. ⠀
Proteins have a lot of hydrogens, and there are several places you’ll find them. Most of them are attached to carbons, and these H don’t like to leave without a really good reason to – those are unlikely to just swap out for a hydrogen from the water. So the exchange rates for H in C-H bonds are too small to measure. ⠀
The H’s in side chain functional groups, like those in hydroxyl (-OH) and carboxyl (-COOH) groups have the opposite problem. They swap out so rapidly that when you quench the reaction in a normal water-based solution, they swap back to the light form, leaving no evidence that any change occurred in between. It’s like when you take them out of the bath, some regions dry off before you even knew they were wet. ⠀
But all hope’s not lost – there’s another place that you find H’s in proteins – in the amide (-(C=O)-NH-) functional groups in the generic backbone (aka backbone hydrogens). All the letters have it except for proline, whose side chain “loops back” to bind the N so the N doesn’t have electrons to share with the H. Some letters also have exchangeable H’s in N-H’s in their side chains as well (e.g. lysine and arginine).⠀
The H’s in the N-H backbone bonds are exchangeable at a measurable rate – if they’re accessible that is. A lot of the time these H’s are tied up in hydrogen bonds with other atoms. In fact, a lot of protein structure comes from these H’s H-bonding to the carbonyl (C=O) oxygens of the backbones of other letters in other parts of the proteins. Such backbone-backbone interactions give the protein its “secondary structure” – things like alpha helices and beta strands. ⠀
What are hydrogen bonds? Basically, when atoms share electrons in covalent bonds, they don’t always do so fairly. One of them may hog the shared electrons and we say the hogger is electronegative. Oxygen and nitrogen are two of those really hoggy ones, so when they bond to hydrogen, they hog the electron pair they’re supposed to be sharing with the hydrogen, making them slightly negative (δ-) and leaving the hydrogen slightly positive (δ+). O & N also have “lone pairs” of electrons that attract such hydrogens. When an electronegative atom with a lone pair (like the O in a carbonyl, which is an O double-bonded to a carbon) is attracted to an H attached to an electronegative thing, you get a hydrogen bond (H-bond).
H-bonds are not covalent (no actual electron sharing, just attractions) so they’re not as strong as the covalent bonds that actually give the protein it’s primary structure (connect the letters in linear fashion). But they can add up to really glue the protein together. There’s structural strength in numbers – and there are lots H-bonds in proteins!⠀
In highly structured regions of the protein, those H’s won’t be available for swapping. Though if you wait long enough those bonds can break and reform as the protein “breathes” and this offers a chance to sneak in.⠀
Speaking of time, what’s normally done is you deuterate for several different lengths of time – the longer it takes for an H to get swapped, the harder it is to find and/or the more tied-up it is. To stop it you “take away the DO⁻” by adding acid, which neutralizes the DO⁻ and lower the temperature, depriving molecules have of the energy needed to do all that swapping⠀
If you do this quenching at different timepoints (e.g. 30s, 1 min, 2min, 5 min) you can get a sense as to how dynamic various regions are. The less protected a region is, the faster it will get heavier, and the heavier it will get.
Some things you can do with HDX-MS:⠀
at the large scale – global HDX measures mass of the whole protein (no cutting it up first). you can do things like compare w/& without binding partner (ligand) -> tells you about overall binding (does it bind or not) under different conditions (e.g. at low pH, high pH, low salt, high salt, etc.) and/or w/different introduced protein mutations (e.g. if you think a residue is important for binding & you change that residue to a different letter, will you still get binding)⠀
at the finer scale – “local HDX-MS” (with cutting*) – look at changes in specific regions of the protein. This is what I’m doing.
The protein is deuterated and then cut by a pretty promiscuous protease called pepsin which is immobilized on beads (resin) in a tiny column. The peptides are captured on a pre-column before they can escape too far. This pre-column concentrates things and lets you wash off salts and other non-peptide stuff. Then you let them go them go, sending them into another tiny column which will separate them based on how much they want to interact with the resin in the column (the stationary phase) versus the liquid they’re dissolved in (the mobile phase). It’s similar in concept to the preparative-scale protein chromatography I do, but on a much smaller scale (these analytical-scale columns are only like 1mm internal diameter and 5cm long!), at much higher pressures, and with different buffers.
The columns used for mass spec are typically “reverse-phase” so they’re nonpolar, hydrophobic. The peptides glob onto them and then you elute from them gradually with a gradient of an organic solvent like acetonitrile – the more hydrophobic the stuck-on peptides are, the more nonpolar you need to make the solvent in order to convince them that the solvent is better than the resin, so the further into the gradient they’ll come off.
The column helps make the data less overlappy and so you can better tell signals apart and it also provides a level of extra information we can use to tell what signals correspond to the same peptide under different conditions (once you introduce deuteration, you’re “messing up” the ability to ID the peptides based just on their m/z so you need these extra retention time info).
The peptides come off from (elute from) the column and then get ionized by that electrospray thing and then they pass through a gas-filled ion mobility thing to further separate the ions based on their size and then they get filtered so that only peptides with specific m/z ratios can reach the detector. The detector generates an electric signal that gets drawn on a graph and stuff, and that’s about all the detail I’m gonna attempt to give as to the actual process!
*Instead of cutting with enzymes (the bottom-up approach) you can break it in the gas phase in the “top-down” approach – instead of cutting the protein in lots of places, each time it just cuts it in 1 place giving you 2 big pieces. But “each” of those pieces is cut somewhere different so you get different sized pieces and then you can compare their weights⠀
I can’t give details yet, but I’m analyzing a protein with some different binding partners and looking to see what changes. For each condition, the sample was quenched at 4 timepoints and it was run in triplicate so I have a lot of data to comb through. In “pre-runs” they ran undeuterated, but still cut up, samples of the protein in order to identify the various peptides (protein pieces) in a step called protein mapping. This allows them to check that “all” of the protein is covered and determine the retention times (how long it takes the peptide to come off the column) for each peptide in order to tell the software to keep an eye out for. Now my job is to help the software identify the actual peptide signals from the background noise and compare them.
The software identified the peptides based on their mobility in an ion phase, their retention on the column, and their falling within a possible m/z range (calculated based on the undeuterated sample’s m/z signal and the possible deuterium uptake). But then it needs help to figure out the m/z signals corresponding to those peptides, because there can be interfering signals from other things that happen to fall within that range of parameters you told the software to look for. Each “peptide” has spectral plots with this data, but you have to kinda help pick out the right sticks – those that actually correspond to the peptide.
One of the things that really confused me about peptide mass spec plots is that instead of each peptide having a single peak/stick, like you might see with a small molecule, you see a sort of asymmetric curve called a mass envelope. The reason for this is that peptides have lots of atoms and it’s not only hydrogen that has neutral isotopes. Carbon, for example, also has low levels of naturally-occurring isotopes (mainly some C13 scattered in with the “normal” C12) and any molecule contains a random mix of them (based on the isotopes natural abundances, which for carbon is ≈ 98.9% C12 and ≈ 1.1%. C13). Thus, the “same” peptide sequence-wise can have slightly different masses and you get an envelope representing the isotopic distribution. The “centroid” is the center of the data under that distribution.
Because the x-axis is m/z (mass over charge), the distance between the sticks in the envelope will depend on the charge state. If you’re looking at a +1 charged peptide, the sticks will be “1” apart, +2 they’ll be 1/2= 0.5 apart, +4 and they’ll be 1/4=0.25 apart, etc. The bigger the peptide, the more opportunities there are to get charged, so you can get high charged states which make the sticks really close together and harder to accurately pick out, but the spacing relationship can help you identify which sticks are the ones you’re interested in (thankfully for each spectrum, you only have to pick out one of the sticks and it will automatically pick the others in the envelope based on the spacing).
That relationship between the sticks won’t change when the sample gets deuterated, but the whole mass envelope will shift to the right. The more deuterium is taken up, the more it will shift, and the difference between the centroids of the undeuterated and the deuterated is the deuterium uptake. In addition to the interesting stuff (H-bonding, etc.) that uptake will depend on how big the peptide is. The bigger the peptide, the bigger the theoretical maximum uptake, which can leave smaller more exposed peptides looking wimpier. So it can be helpful to convert the y axis to relative uptake, which will tell what fraction of the theoretically-exchangeable H’s were exchanged. This evens the playing field between peptides of different sizes so you can better compare.
I found this really good paper walking you through the whole process: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3713502/
this sentence describes what’s gonna be consuming my time for a while… “Whereas a human is much slower at making all the manipulations required to do the actual deuterium incorporation determination, a human can very rapidly provide quality control for thousands of software-determined deuterium incorporation determinations per hour, ascertaining if the software has performed well or not.”
But I’m currently still waiting for the software to do its part
link to a good review paper: https://www.nature.com/articles/s41592-019-0459-y ⠀
more on H-bonds: http://bit.ly/frizzandmolecularattractions ⠀
Hope I got that all right – I’m definitely not a mass spec expert! But I think I’m starting to better understand what I’m doing. And it really helps me learn for myself by trying to teach others.
P.S. Huge thanks to my e-friend Nefeli Boni-Kazantzidou who’s been helping teach me! You might remember Nefeli from her Greek post translations. Well, she’s also a really great friend and a mass spec-er in training. And she’s super patient!
more on topics mentioned (& others) #365DaysOfScience All (with topics listed) 👉 http://bit.ly/2OllAB0⠀