Experimental design: what you see depends on what you seek and how you seek it

In this piece, I hope to take you inside the mind of a scientist as they plan out an experiment. With any experiment, you have to make compromises to get the type of information that is most important for you. Many of these “compromises” involve controlling variables, a crucial aspect of the scientific process.

A Thought Experiment

To illustrate some points, let’s start with a “simple” experiment, one you might have done for a science fair. Say you want to test the effects of water and light exposure on plant growth. Seems pretty straightforward, right? Water and light are the “independent variables” you want to test and growth is the “dependent variable” that is “dependent” on your independent variables. So you take some seeds, give them different amounts of water and light, and measure their growth. But wait, what do you mean by growth? Increases in height, leaf size, circumference, total mass? In this simple experiment, you could collect them all, but this often not practical. Once you choose your experimental “read out” you need to determine when to take the measurements – this depends on what you want to know. Are you interested in changes in the rate of growth? If so, you’d want to take a series of measurements at fixed timepoints. If you don’t care about rate, just overall growth, you could just take a single measurement at a single timepoint.

For simplicity’s sake for this thought exercise, let’s say you decide to measure plant height after two weeks. Now you have to decide how you want to change your variables. If you change the amounts of water and light at the same time, any changes in growth you see are a combination of effects of changing water and changing light and you won’t know how much of the change you see is due to which factor. If you want to get information about the individual contribution of one variable, you need to hold the other variable constant – so you run two parallel sets of experiments. In one, you give each plant the same amount of water but different amounts of light and vice versa for the other set.

If all variables but the one you’re interested in are kept constant, then any differences in the dependent variable (growth in this case) are taken to be due to the independent variable you changed. If every other variable really were kept constant, this would be true, but this theoretical perfectly controlled system doesn’t exist. There is always some variability in your variables! For example, there could be genetic differences in the seeds, slight differences in soil composition, differences in distance to the light, etc. The difficulty of controlling experimental variables is especially pronounced in biology because living organisms are incredibly complex.


It is impossible to control for every variable – to account for this, scientists include replicates. With replicates, you hope that although each replicate will differ slightly, these differences will “buffer” each other, similar to how all the colors in the rainbow “cancel each other out” to make white. There are two main types of replicates that are both important:

Technical replicates are when you test the same sample multiple times to buffer out inconsistencies in measuring. In our plant case, this would mean measuring each plant several times – Were you measuring from exactly the same starting point? Did you correctly count the number of lines on the ruler?

Biological replicates are when you test different samples that are “identical” in all aspects but their source. In our plant case, this would mean including multiple seeds in each treatment group. Since it’s just a theoretical experiment, we could include as many seeds as we want, but in real experiments there are practical limitations (e.g. availability and cost of samples, amount of time and energy needed to collect the data).

In order to detect effects of your treatment, you need to make sure that differences between treatment groups are bigger than differences between individual samples within those treatment groups, and there are statistical tests scientists use to estimate how likely it is that the effects are due to the treatment.


Controlling variables is crucial, but even if you could perfectly control every variable but the one you are interested in, you would lose important information in doing so. In science we talk of “non-additive effects” – where the sum of the effect of individual variables on their own is less than their combined effect because the variables themselves are interdependent. Say you wanted to determine the optimal amount of light and water for plant growth – you change these variables independently as we outlined above, and determine that the optimal amount of light is some value, A, and the optimal amount of water is B. This doesn’t necessarily mean that the optimal growth conditions are A + B. It could be that light has a bigger effect at a certain water level, but you wouldn’t see that effect if you only tested at a different water level. It also could be that one of the “controlled variables” such as temperature has a similar effect, with the effects of light or water being more pronounced at certain temperatures. Obviously, it’s impossible to test each combination of variables, so scientists must make compromises when designing their experiments.

A more “real-world” example

To show how these concepts play out in a more realistic scenario, let’s consider pharmaceutical drug development. Many early experiments are performed on cells in a dish (cell culture), which allows for moderate control over variables while still working in a cellular context. If a scientist wants to test the effects of a drug on human cells, they could take cells and plate them in 2 dishes – add the drug to one dish and only the delivery vehicle (the liquid the drug is dissolved in) to the second dish as a negative control. As we saw above, technical and biological variability could affect the results so the scientist would actually want to set up a number of dishes, not just one of each.

Say the scientist sees that the drug has a desired effect – it’s not quite time to celebrate. To make sure that the observed results weren’t specific to that cell preparation, they would also want to repeat the experiment on a different date with “new” cells. Next, they will likely test the drug on a different cell line (the initial source of the immortalized cells is different, not just the “batch” of those cells) to make sure that the effects aren’t cell-line specific.

If the drug has the same effect on multiple cell lines, it is more likely to have that effect in the body (in vivo), but this is far from guaranteed because the life of a cell in a dish is much different from the life of a cell in the body, where there are complex dynamics between cells and their surrounding environment, not to mention potential “off-target” effects that could cause dangerous complications. This is why further testing of the drug is required to determine 1) is it safe and 2) does it work?

When it comes to testing drugs in people, controlling (and over-controlling) variables is often a point of contention. If you thought cells in a dish were inherently variable, complete human beings are all the more so! In order to control for some some this variability, there are often strict requirements for participation in drug trials. As we saw above, there are legitimate reasons for such control – for example, if you test a drug in a patient who has an additional medical problem and that patient has a complication, you don’t know if it’s because of the drug alone or the preexisting condition, or the combination of the two. However, a problem often arises with regards to over-controlling variables. Tight control can lead a drug to be tested and approved on a population that isn’t representative of the true patient population. The drug therefore might not be effective in most patients (and can even have adverse effects). As you can see, scientists must make difficult and careful decisions when designing their experiments.

How did we get here? – A personal example

I didn’t plan on writing this piece – I initially started writing a piece on the cool filter-binding assay I’m using to test for protein-RNA interaction (don’t worry, it’s coming), but I realized that it was important to first explain why I chose to use this experiment, why other scientists might have taken different approaches to studying the same overall interaction depending on what data they cared most about, and why none of these approaches would be “wrong.”

If I wanted to know if this interaction occurs in cells, I could use a common experiment called co-immunoprecipitation (co-IP); this technique is often referred to as a “pull-down” assay because you “pull” a specific molecule out of cells and see what’s bound. I could pull out the protein and then see if the RNA came with it. This can tell me if they interact in cells, but it doesn’t tell me if the interaction is direct or indirect (e.g. the RNA is actually binding a different protein that is binding the protein you’re interested in).

Because I’m interested in testing for direct protein-RNA interaction, I am doing an experiment using purified protein and RNA. In these experiments, I can tightly control variables and I know that any interactions I detect are direct. If I see evidence of interactions, I can say that the molecules can interact directly, but this doesn’t mean that they do interact in biologically-relevant contexts. And if I don’t see interactions, this doesn’t mean that they don’t interact directly in vivo – it could be that they only interact at certain temperatures, with certain salt concentrations, etc.

Many Approaches

These are just two experimental approaches and there are many more. One isn’t inherently “better” or “worse” they just give you different information. Each experiment, in all areas of science, has its strengths and weaknesses. It is important that scientists explore their options, think critically, and choose the experiment that will answer the question they’re looking for. Ideally, scientific conclusions should be drawn from multiple lines of evidence, multiple experiment types. Similarly to how a large number of biological replicates helps “buffer” variation, using multiple experiment types allows the strengths of one technique to complement the weaknesses of another.

In addition to careful experimental planning, it is important that scientists recognize the weaknesses and limitations of the methods they choose to use and convey these caveats to their audience. If you are the audience, some things to look for are: replication (technical but especially biological) and multiple lines of evidence (different types of experiments used).

This piece isn’t meant to dampen your enthusiasm for science, but rather to help you think like a scientist and understand why we do the things we do. There are many ways to answer similar scientific questions and the particular experiment you choose depends on many factors (both practical and theoretical). Like in everything, there is variability among scientists and the techniques we choose, but this variability doesn’t make the pursuit of science less valid.

Leave a Reply

Your email address will not be published.