Solving the structure of insulin (figuring out the 3D arrangement of its atoms) was the crowning achievement for the pioneering crystallographer Dorothy Crowfoot Hodgkin – it took her 34 years, but she did it – and lives of countless diabetics have been saved because of it, as the structure has enabled pharma companies to develop optimized versions for treating diabetes.
Insulin is a hormone (chemical messenger) your body uses to help control blood sugar levels. When blood sugar (glucose) levels are high, insulin is released by the pancreas and travels throughout the body where it binds to insulin receptor proteins embedded in the membranes surrounding cells and passes along the message to let glucose in. This lowers the blood glucose levels and lets the cells put that glucose to use making ATP for energy storage (through glycolysis and cellular respiration) and/or breaking glucose down partway for pieces that it can restitch together to make different molecules (like other sugars, proteins, or fats).
I’m grateful that I don’t have diabetes, but, for people who do, their body has a hard time making and/or “hearing” insulin’s call to “let glucose in!” As a result, they have trouble controlling blood sugar levels, so they often use devices called glucometers to measure blood glucose levels & then, if the levels are too low, they can take insulin. More on those here: http://bit.ly/glucometers
Different versions of insulin have been created to be fast-acting (big effect quickly, but all at once – like doing a tequila shot (a expect, but I don’t drink alcohol so don’t really know…)) or slow-acting (effect spread out over time, like sipping a glass of wine over a day?) and their design was made possible by figuring out its “natural design” – the 3D structure it adopts.
Diabetes is a disease where the body either doesn’t make (type 1 diabetes (T1D)) or can’t use effectively (type 2 diabetes (T2D)) insulin. As I mentioned briefly, insulin is a hormone, and hormones are chemical messengers that can relay messages throughout & between cells in your body. Different hormones are made of different building blocks & relay different messages.http://bit.ly/adrenalinehormonesetc
There are different types of hormones classified by their different molecular arrangements and origins. So, for example, there are steroid hormones (made of ring-y things that are lipid-soluble so they can get into cells); amine hormones (made from individual amino acids (protein letters)); and peptide/protein hormones (made from chains of amino acids – like tiny proteins or protein pieces).
Insulin is in this latter class – it is a peptide hormone, so it’s made up of amino acids linked together through peptide bonds. These amino acids are the same “letters” as proteins use, and the peptide bond linkages are the same, but “peptide” is typically used to describe shorter chains of amino acids (folded or unfolded), whereas “protein” refers to longer chains that fold up into functional (and beautiful) 3D shapes.
In the context of signaling, proteins are more like novels while peptide hormones are like “text messages”). The text sender in the insulin story is an organ called the pancreas, & the message it relays is – there’s a lot of sugar in the blood – let’s take some into the cells, use some, & store some, why don’t we?
The pancreas basically sends out a “mass text” to cells throughout the body in the form of insulin. The cells respond by opening their doors to glucose (the main monosaccharide (single sugar unit)), importing it from the bloodstream into the cells thus lowering the amount of glucose in the blood. And it tells cells in the liver to “turn on” an enzyme called glycogen synthase in which strings excess glucose into chains (polysaccharides) called glycogen for storage. more here: http://bit.ly/carbscience
Too much sugar is hypERglycemia (remember hyper-ovER). Too little sugar is hyPOglycemia (remember hypo-below). Both are bad. Which is why having a properly-functioning “early alert” system through insulin signaling is so important. It gives your cells time to prepare & avert a crisis.
But in Type 1 Diabetes (T1D), it’s like the text message is never sent and T2D is like the text gets sent but the phones have the caller on their block list so they never hear the message. If cells are phones, it’s not that they’re completely turned off – they can still get other messages fine, but their bodies have become “desensitized” to it.
Insulin has been a cornerstone of diabetes treatment, especially for patients with T1D, who can respond normally to insulin, they just can’t make enough of it on their own. However, it’s really important that the administration of insulin is tightly controlled so your body’s not just “taking hits” of the stuff. Sometimes you want something that acts really fast, but other times you want something that gives a more gradual, prolonged effect. So pharmaceutical companies have created “designer insulins” that are fast-acting or slow-acting.
One of the main differences between these forms is their tendency to “oligeromize” – individual insulin units (monomers) can pair up to form dimers and then 3 of those dimers can stick together to form hexamers. It’s the monomer form that’s biologically active, so encouraging this form makes for faster-acting insulin, whereas encouraging the hexameter form slows things down.
Note: “oligo-” means “a few” or “several” and here means that you have several protein chains stuck together to make the “oligomeric” form. “Multimer/multimeric” is another term used to describe proteins with more than one chain. As far as I can tell, multimer is more inclusive as it can be used to describe situations where you just have a few chains to situations where you have a lot. In practice, I (and others in the field) tend to use the terms interchangeably, to the frustration of some as I discovered when trying to discover the difference… https://bit.ly/3k7QV5q
Terminology aside, the key thing to appreciate is that insulin is made up of chains of amino acids that can hang out alone (as monomers) and act as a messenger or hang out with other insulin chains in an inactive form. So, how do you encourage insulin molecules to hang out or not hang out? It helps if you know how the hanging out comes about! To grasp this there are a few key concepts to keep in mind.
First, and at the core of basically everything in biochemistry, is the concept that opposite charges attract & like charges repel. And these charges don’t even have to be “full” charges in order to see the effect.
Molecules are groups of atoms held together by strong, “covalent” bonds – the type of bond that doesn’t break easily. Those atoms (individual hydrogens, carbons, oxygens, etc.) are made up of even smaller parts called “subatomic particles” which include protons (which are positively-charged), neutrons (which are neutral), and electrons (which are negatively-charged).
The charge of molecules comes from from uneven number and/or distribution of electrons. Electrons are teeny weeny & they whizz around the dense atomic nuclei where an atom’s protons & neutrons hang out. The outermost electrons (called valence electrons) are furthest from the pull of the positive protons and thus are the most energetic & reactive. These valence electrons can pair up with valence electrons from other atoms to form strong covalent bonds.
Sometimes they can get pulled all the way away from their parent atom, leaving the parent with one fewer electron than protons, so a full charge imbalance (and similarly, but oppositely with the one that gains an extra electron). And we call such fully-charged particles “ions” – “cation” refers to a positively-charged one and “anion” to one that’s negatively-charged. Metals often lose electrons and hang out as cations – so, for example, zinc (Zn) likes to hang out as a divalent (2 charged) cation (+ charged thing), which we can write as Zn²⁺. Keep this in mind because it’s going to come into play in the story I’m telling today!
So, ions are molecules which have a full, “formal” charge. But sometimes, even when electrons are shared and not stolen (i.e. they’re still “owned” by their original atom) they’re shared unevenly; as a result, even though the molecule is neutral overall, parts of it are partially charged. And partly charged regions can get attracted to & repelled by other partly or fully charged regions of things.You get this uneven sharing (a polar covalent bond) when one of the sharing partners is much more electron-hogging (electronegative) than the other.
Oxygen (O) & nitrogen (N), for example, have much more pulling power than hydrogen – so when they form covalent bonds with H, they convince the H’s lone electron to hang out near them more. This makes the H partly positive (δ+) and the O or N partly negative (δ-). The partly positive H then frequently gets attracted to a lone pair of electrons on a different O or N, leading to a type of attraction called a hydrogen bond (H-bond).
note: “δ” is used to mean partly. And it’s the lowercase Greek letter for delta. Don’t confuse it with uppercase delta, Δ, which means “change in” (and is often also used in biochemistry as shorthand for specifying deleted regions – for example, ProteinXΔ34-55 would refer to some “Protein X” which is missing the 34th through 55th amino acids.
H-bonding is really important in biochemistry and it’s responsible for everything from water being “sticky” (surface tension, etc.) to DNA strands sticking together. They’re so important because they’re weak enough for individual bonds to easily come apart and back together (and apart and back together….) but they can be super strong in numbers. So, for example, to “unzip” the strands of double-stranded DNA you have to heat them up or use molecular “helpers” – protein enzymes called helicases spend energy money to break them apart. But the individual strands stay “strand-y” because the the DNA letters (nucleotides) are connected by strong covalent bonds that involve actual electron sharing instead of just charge-based attractions.
H-bonding also plays crucial roles in determining the structure of proteins (which has multiple levels). Proteins are made up of letters called amino acids, which have generic backbones for linking any of them to any other & unique side chains (R groups) that stick off like charms in a charm bracelet. Amino acids are linked into chains through peptide bonds, and the order of amino acids in the chain (i.e. its sequence) is referred to as the “primary structure.” It’s of primary importance for determining how the protein ultimately folds because different side chains have different preferences for where in a protein they end up, who they like to hang out near, and what angles they are and are not willing to adopt.
But don’t count out the generic part! The peptide backbone of the protein is crucial to a protein’s structure, in particular its “secondary structure.” The peptide bonds linking amino acids are covalent bonds that form when the carboxyl (-(C=O)-O⁻) of one amino acid joins with the amino (NH₂/-NH₃⁺) group of another amino acid. This leaves you with alternating potential H-bond donors (the H on the NH) and acceptors (the O on the C=O) along the peptide backbone. H-bond forming between these parts of the backbone can lead to common “secondary structure” motifs like alpha helices (those spring-y looking things) & beta strands. Beta strands are often depicted as flat arrows and strands can H-bond with one another to form “beta pleated sheets.” When neighboring strands are running in opposite directions (arrows pointing different ways) we call them antiparallel and when they’re running the same way we call them parallel.
Further H-bonding opportunities are provided by some of the unique side chains (R-groups) of the different letters. These letters can interact with other letters (either their side chains or backbones) within their chain to form “tertiary structure.” And they can interact with atoms from other protein chains to form “quaternary structure.”
As a result, peptide chains can interact with one another and you can get multiple individual chains or subunits (monomers) grouping together to form “oligomers” – oligomers can be multiple copies of the same chain (in which case we call them homooligomers) or different chains (heterooligomers).
Sometimes oligomers are the functional “active” form of a protein, but other times, oligomers are inactive forms. It all depends on the protein.
This latter situation is the case with insulin – it can form dimers that can form hexamers, but those hexamers are *Not* the active form. Instead the active form is the monomer. The reason the hexamers form starts with the reason dimers form, which involves H-bonding between the B chains. And here’s where the nomenclature starts to get a bit tricky. I’ve been using the term “chain” to refer to the polypeptide chains encoded for in a protein’s genetic instructors, which the ribosome faithfully links together. Each protein is synthesized as a chain like this. And usually this chain just folds up into a functional protein. Sometimes grouping up with other chains, but that chain stays that chain.
However, in the case of insulin, it gets made as a longer chain called preproinsulin that gets cleaved in 2 places to give you mature insulin which is made up of 2 stuck together smaller chains in each “monomer” of insulin. The first 24 amino acids form a “signal peptide” – As the protein gets made, these come out of the ribosomal tunnel first and signal to the cell that this protein is destined for secretion (getting shipped out). For such secreted proteins, processing usually takes place in a special compartment in the cell called the endoplasmic reticulum (ER). So the ribosome sends the finished chain in there. And then, since it’s no longer needed, the signaling peptide gets cleaved off, leaving you with proinsulin. This pro-insulin then folds up, and gets cleaved again to give you 2 chains, alpha (21 amino acids) & beta (30 amino acids) (both from that original chain).
These chains stay stuck to one another because they have 2 key disulfide crosslinks. Unlike most side chain interactions, disulfide bonds, which can form between cysteine residues (eg. protein-SH + HS-protein -> protein-S-S-protein) are covalent bonds. So they’re strong. And keep the strands stuck together even though their backbone’s broken. So each mature insulin “monomer” is 51 amino acids in 2 chains from 1 original chain. This is *not* the dimer.
Your pancreas has to stock up on a lot of insulin so it has it ready to ship out when needed. But you don’t want it taking up a bunch of space and you don’t want it “breaking loose” so it makes biological sense to store it in a compact, inactive form. The hexametric form is great for this purpose. When insulin gets secreted from the pancreas into the bloodstream, the hexamers fall apart into the active monomers because the zinc concentration is way lower in the blood.
But turns out that the hexamer-izing can be a problem if you want to use insulin as a drug to treat diabetes. Because, while the monomers and dimers can easily diffuse into the bloodstream, those big ole hexamers have a harder time getting in there if you just inject them under the skin (subcutaneously). To get around this problem, scientists make “designer insulin” thanks to recombinant protein expression technology (being able to stick protein instructions into cells to have them make it for you). Pharma companies can change insulin’s amino acid spelling (primary structure) in a way that doesn’t affect its receptor binding but does prevent dimerization and hexamerization, so that it absorbs better and acts faster. Alternatively, they can change the spelling to promote linking up to make it act slower and last longer so you don’t have to inject it as often.
But in order to make those changes, you need to know where to make them, and this is where Dorothy Crowfoot Hodgkin becomes the superhero of our story. Hodgkin was a true pioneer in the technique of x-ray crystallography, which allows you to harness the power of x-rays to figure out what molecules look like at the “atomic scale” (how do all the carbons, etc. link up).
You can learn more about Hodgkin here: http://bit.ly/dorothycrowfoothodgkin
And more about crystallography here: http://bit.ly/xraycrystallography2
But here’s the basic gist of the technique: you get molecules to crystallize (arrange themselves into an orderly 3D lattice) -> beam x-rays at them -> x-rays get scattered by the molecules -> scattered x-rays interfere with one another, some “cancelling out” while others strengthen one another depending on their relative phases (where in their peak(high point)-trough(low point)-peak-trough… cycle they are) -> these “diffracted” x-rays hit a detector, leaving a pattern of spots called a diffraction pattern -> you work backwards from those spots to figure out where they scattered from.
Doesn’t sound so bad, right? Wrong. Firstly, getting a protein to crystallize into nicely-diffracting crystals is often no easy feat (think trays and trays of screening for the optimal crystallization conditions, which varies from protein to protein). Dorothy was so excited to get her first diffracting crystals, in 1935, that she panicked herself into thinking the diffraction pattern was coming from a salt or some contaminant or something instead of insulin – so she rushed in the following morning to check! These weren’t even very useful crystals in terms of data – but they showed that it could be done! There was hope – and by optimizing and optimizing she was able to grow larger, better-diffracting crystals.
But, even once you have a diffraction pattern, you still have to do the hard math of figuring out what those spots correspond to. Even with powerful computers, which Dorothy didn’t have to help her, you’re left with the “phase problem”
when x-rays hit crystals, they get scattered from the electrons of the atoms because they kinda “ring” the electron clouds, getting the electrons to give off their own waves of the same wavelength (kinda like billiard balls getting tossed in a pool and generating rippling waves rather than billiard balls bouncing off pool table walls). The waves add together through wave interference, sometimes canceling out, other times strengthening each other, leading to kinda “megawaves.” You can capture the signals from these megawaves on the detector and deconstruct the megawave to the miniwaves using a math thing called a Fourier transform. But it requires you to know the “phases” – was a wave peaking, troughing, or somewhere in between when it hit? And you lose this information in crystallography – you only measure the amplitudes (how strong was the wave when it hit). This leads to the “phase problem”
Carbon and oxygen and hydrogen and all those “usual players” in biochemical molecules don’t vary very much in terms of their electron stock (hydrogen has 1, carbon 4, and your “big guy” oxygen has 8). But so-called “heavy atoms” like lead? In its neutral form it has 82. So it’s kinda like having 82 hydrogens in one spot – the lead atoms will scatter much more strongly than the other atoms in the structure. And this kinda “puts a pin in the map” – if you know what to look for, it “sticks out” in the diffraction pattern, allowing scientists to orient themselves in the signal.
So, to solve the structure of insulin, Dorothy used heavy atom soaking – she used EDTA, a metal chelator, to remove the zinc (a chelator is a molecule that can bite metal in multiple places, so it can, for example, steal zinc from the insulin). Then she added lead to replace the zinc the EDTA stole. In addition to replacing the zinc, the lead bound a few other places too. So this gave her multiple pins in the map. (note: while useful here, this same zinc-replacing ability is one of the reasons lead is dangerous in your body). http://bit.ly/leadheme
Dorothy also made other heavy atom derivatives to try to “map pin” other sites. She made derivatives with zinc and lead as well as derivatives with a couple other heavy atoms – uranium (from uranyl fluoride & uranyl acetate) and mercury (from mercuribenzaldehyde). After a lot of hard work, In 1969, 34 years after taking those initial photos, she reported the structure of 2 Zn insulin at a resolution of 2.8A. (resolution refers to how close together 2 things can be and you can still make them apart – so for example, a bright light in the sky might look like a single spot, but at higher resolution you might be able to tell it’s actually 2 stars – or a whole galaxy! But in the case of crystallography, we’re “looking” at much shorter distances. Å stands for “angstrom” and it’s a unit of distance equal to 10⁻¹⁰ meters or 0.1 nanometers (0.1 nm).
Dorothy’s structure showed that the hexamer assembles around 2 zinc atoms in the “2 Zn form”, with the Zn²⁺ held there through “coordination” to a histidine (His, H) from each monomer, His10 (this nomenclature means it’s the 10th amino acid in the chain). Coordinate bonds are a special kind of covalent bond in which one partner (typically a metal) donates a pair of electrons instead of each partner donating a single one (so it’s kinda like the metal is footing the tab for the whole bond, but it’s happy to do so because it has a lot of electrons to spare). One Zn²⁺ coordinates with 3 of the His10’s and the other Zn²⁺ coordinates with the other 3.
Glutamate (Glu, E) 13 in the B chain is at the center of the hexamer and, unlike what you’d expect, the Glu13’s from the different chains are clustered together. Why wouldn’t you expect this? Glutamate is negatively-charged normally. And as we talked about, like charges repel. So you’d think they’d push the insulins apart. But here’s part of where the Zn²⁺ comes in. With its positive charge, it’s able to help neutralize the situation and stabilize the hexamer. So when there’s a lot of Zn²⁺ you get the hexamer form, good for storage (and diffraction) but when Zn levels are lower, like in the bloodstream, the glutamates can repel. Also helping things out, the pH in the blood is higher than in the storage vesicles in the beta cells. Lower pH (greater acidity) means there are fewer H⁺ around to kinda buffer things, so the glutamates really want to get away from each other, therefore helping promote monomerization
The structure showed that the dimers were forming because of hydrogen-bonding between the C-termini (ending ends) of the B chains of the monomers. Knowing this, once recombinant protein expression was possible, pharma companies could change insulin’s spelling in a way that didn’t affect its receptor binding but did prevent dimerization & hexamerization. For example, insulin lispro (Humalog) swaps 2 C-terminal residues in the B chain, Pro28 & Lys29 to Lys28 & Pro29 (check out PDB entries 1lph or 2kjj). Another one, NovoRapid, mutates a proline to an aspartate. As you might remember from #20DaysOfAminoAcids, Proline (Pro, P) is the least flexible amino acid because its side chain loops in on itself, rebinding to the backbone. So changing this to aspartate (Asp, E) gives greater flexibility and favors monomerization (PDB entries 1zeg, 1zeh or 1kei)
There are also forms to make insulin slower-acting so you don’t have to inject is as frequently. For example, Levemir is insulin covalently bound to a fatty acid – this makes it stick to albumin (a protein that’s at really high concentrations in the blood). The albumin competes with the receptors, leading to longer action. Degludec is a similar version but with stabler hexamers too.
Dorothy solved the structure of the pig (porcine) version of insulin. It only differs from the human version by a single amino acid at the C-terminus of the B chain. Human receptors still recognize it, so, even before recombinant expression technology, animal versions of insulin could be purified and used to treat diabetics.