To err is human, to not report the error is inhumane! Say you have a really competitive game of pin the tail on the donkey – judges and all. Contestants each get a certain number of tails to pin & the winner’s chosen based on their precision (how close together they pin the tails) & accuracy (how close to the desired location (x on donkey butt) the tails are.

Determining how accurate they are requires measuring the distance between the x and the tail pin. And this is where the judges come in. Yes – judgeS, plural! Multiple judges all measuring how close the tail is to the x marking where it should be. Each takes a turn measuring the distance and then the game-master averages those distances and announces the official reported distance. How many digits should he report (how many sig figs)? And how does he report how much agreement there was among the judges (what’s the variance or deviation)?

What do I mean by sig figs? If you plug 1 divided by 3 into your calculator you’ll see something like 0.333333333. Depending on how big your calculator screen is you might see 0.33 or 0.333, etc. But the value hasn’t changed. 0.333333333 may sound way more fancy-dancy, but it’s not. But it’s “dangerously” fancy because it gives the false impression that you’re fancier than you really are!

When the game-master goes to report the average, and it’s that, if he didn’t round it somewhere he’d be standing there rattling off 3 to infinity. But he can’t just round it off anywhere. He has to do so at the decimal place that includes all the numbers that are “certain” and 1 that’s the best estimate. These are what we call significant figures “sig figs” and the # of sig figs gives an indication about how confident the audience should be.

But I think it helps to look at an example. So let’s get back to that pin the tail tournament. So, after each pin attempt, the judges each get a turn measuring it with the same ruler. Assuming they know how to use a ruler and aren’t out to get any of the contestants, each judge will get close to the same number. But not the *exact* same number.

The judges are limited by their measuring tool, which in this case is a ruler. It’s easy to see if something’s above or below a ruler line.  So the judges can all tell that the tail is above one line but below the one above it. But, it’s harder to tell just how in-between it is. So the judge will have to make their best estimate of the tail’s location between those 2. They only get to write one number in these estimates (if it looks ~1/2 way between the lines, write down a “5” not a “50” and if it looks ~1/4 of the way between the lines you’ve gotta choose between a 2 and a 3)

If they’re using a standard ruler, they probably see lines every mm. But if they’re trying to measure it with a meter stick, the lines are probably cm apart. So with the ruler, they get to estimate the distance between mm (0.1cm) lines. But with the meter stick the lines they’re estimating between are cm lines.

With the mm-marked ruler you can see that the distance is between 2.5 & 2.6 cm. But with the cm-marked meter-stick you can only confidently say that it’s between 2 and 3 cm.

So the judges in the 1st case can write down 2.50, 2.51… 2.60 but they can’t report 2.513 or anything because they only get to guess on 1 number – and they guessed on that 1.

In the meter-stick case, the judges can write down 2.0, 2.1, 2.2… 3.0 but they can’t report 2.51, and they certainly can’t write 2.513! They can’t estimate that well! So if they wrote down that many digits it it would be like saying that they were using a ruler when they were actually using a meter stick.

Even if all the judges play by the rules, when those values get averaged together, you’re going from measurements, where the sig figs are determined by the measuring device to calculations where the sig figs are determined by the number of sig figs in your initial measurement.

When the game-master plugs those numbers into his calculator to get the average, the calculator doesn’t know how many numbers are significant. So it can start spitting out digits. So it’s up to the game-master to do his job right reporting the average, and only report the real ones.

So say judge 1 says 2.52, judge 2 says 2.53 and judge 3 says 2.56.

The average is (2.52 + 2.53 + 2.56)/3. If you put this into your calculator you get 2.53666666667. And a calculator with a smaller screen might say it’s 2.536667 – this value is actually an “irrational number” so there’s an infinite tail of 6’s and your calculator just rounds off where the screen ends.

But the game-master knows the truth – he knows where the rounding *should* be. It should be rounded so that all the numbers that the judges agree on, and the average number for the guessed one -> the number to report is 2.54. This tells us that all the judges agreed on the 2.5 but they differed on the 4.

What if the contestants were measuring with the meter stick instead? Here they’re limited to guessing for the tens place digit, so something between 2 and 3 cm. Maybe Judge 1 says 2.3, Judge 2 goes for 2.5 and Judge 3 says 2.8.

Now the average is (2.3 + 2.5 + 2.8)/3 = 2.5333333….

What should the game-master report? He needs to report a number that reflects the limitations of the measuring tool – the reported average shouldn’t go further right than any of the individual measurements. So he needs to cut off the value at 2.5.

The middle judge is happy with this, but the other 2 judges feel “ignored” and they want the audience to know that they didn’t quite agree – so they ask the judge to also report the error. “Error” is when the measured values differ from the “real” value and it can be reported a few different ways.

Deviation is how far an individual measured value is from a true value. We don’t know the “true” value because that’s what we’re trying to measure, but the average measured value is our best guess of the true value, so we’ll take that as “truth”

Now for each judge’s measurement we see how far away from the truth it was

Judge 1: 2.5-2.3 = 0.2

Judge 2: 2.5-2.5 = 0.0

Judge 3: 2.8-2.5 = 0.3

So now Judge 2 gloats cuz it got it “perfect,” Judge 1 at least feels that it did better than Judge 3 and Judge 3 just shuts up.

But the game-master doesn’t report these actual values to the audience and now, when that contestant learns that he lost, he gets suspicious. He suspects that there’s some debate among the judges and he wants to know how far apart their reported values were.

Instead of reporting the individual deviations, the judge reports an average error. He takes the individual errors, adds them, and divides by the number of judges (3) to get the average deviation, which is a measure of the *judges’* precision (ah, how the tables have turned…)

(0.2 + 0.0 + 0.3)/3 = 0.16666666…..

Where do we round here? The first place you see a digit – so the game-master can report the value as 2.5 +/- 0.2. Average deviation tells you with 50% certainty where a value could fall, so basically this is saying that 50% of the judges estimated the distance to be between 2.3 and 2.7.

More commonly than average deviation, you see the standard deviation. This standard deviation assumes a normal distribution (bell-curve shaped spread) and is always higher than the average deviation because it covers 68% not 50 -> The 68-95-99.7 rule says that 68% of the data are w/in 1 SD of the mean, 95% are w/in 2SD, & 99.7% are wi/in 3 SDs (when your data is normally distributed (shaped like a symmetrical bell curve)

The game-master gets the SD by taking the square of each deviation, adding up those squares & dividing by the number of judges (3) (to get the variance) and then taking the square root of that variance to get the standard deviation.

sqrt((0.2^2 + 0.0^2 + 0.3^2)/3) =  0.20816659994. That’s what the calculator says but not what the judge should say -> he should say 0.2 because he only gets to report a single digit for the error (due to rounding this is the same reported error as with the average deviation way of saying it).

So if the judge reports 2.5 +/- 0.2 cm he’s saying that 68% of the judges measured values between 2.3 and 2.7cm.

He canNOT say 2.5 +/- 0.21. This gives the false impression that the judges were measuring with the ruler not the meter stick. And, furthermore, there’s no possible way a judge could have reported a value of 2.71. The judges can only report to the tens place. So even if they were “sure” that it was 2.71 they’d have reported 2.7. So we don’t gain any information by adding that 1 in the deviation.

Now what if there was 1 really bad judge? He was holding the ruler upside down reported a value of 17.5cm instead of 2.5.

So the 3 reported values were 2.3, 17.5, and 2.8. Now they don’t even agree on the 2! The average measurement was (2.3 + 17.5 + 2.8)/3 = 7.53333333333. (as per calculator). Notice that they average is now higher and since we don’t know the real true value, our best estimate of the true value increases. And since we only have 3 judges, that value increases a lot!

If we round off based off of the limit imposed by the rule (tens place) we get 7.5. But this only reflects the judges’ confidence in their measurement, not whether that confidence was deserved. So the judge with the backwards ruler’s confidently very wrong

Judge 1: 7.5-2.3 = 5.2

Judge 2: 17.5–7.5 = 10.0

Judge 3: 7.5-2.8 = 4.7

Average deviation (5.2 + 10.0 + 4.7)/3 = 6.63333333333… (so sayeth the calculator)

And standard deviation: sqrt((5.2^2 + 10.0^2 + 4.7^2)/3) = “7.05053189483”

What to report here? We only get a single value for reporting error, so we have to round our error at the first digit BUT our SD has to be on the last place -> when you report a number you’re implicitly saying that you’re confident about all the numbers except the last, which is where there’s “room for error” (but note that the second to last doesn’t have to be absolutely agreed upon if you have something like 2.0 +/- 0.2 which encompasses both a 1 & a 2)

Imagine back to the judges squinting to estimate how far the tail is between those 2 lines. If they can’t agree on the tens place they certainly can’t agree on the 100s place. So you don’t need to tell us that. We just want to know where you start disagreeing – or at least where most of you start disagreeing.

So, when you report uncertainty, you take the digit that you get and round the reported value to that digit. Even if your measuring device is “better than that”

With the 1st SD, we could use that single digit in the tens place and still encompass the values, but here there’s so much disagreement in the single’s digit that the error’s spread there – so we have to round our average value to match.

We have to round our error at the first digit and we have to round our average to where we rounded the error. So the SD rounds to 7 but we can’t say 7.5 +/- 7. Our SD has to be on the last place. So we have to round the average to 8 and report 8 +/- 7. Eek!

more on topics mentioned (& others) #365DaysOfScience All (with topics listed) 👉 http://bit.ly/2OllAB0