Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
It's not always obvious how you should analyze your data. In this video we'll analyze whether some ages are over-represented at the Boston Marathon.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
All right, we've got our lines in.
0:00
Now it's time to find out if any
of our data crosses the line.
0:02
Let's start by creating
a table of ages and
0:06
how many runners of each age ran the race.
0:08
Let's leave a space below Max Difference
and then in row 6 let's add two labels,
0:12
Age and Count.
0:18
And let's go ahead and bold our labels.
0:22
Then, below our age label,
let's add 18, and then 19.
0:25
Then let's highlight both of those cells,
and drag down until we have an entry for
0:31
each age between the min at 18 and
the max at 84.
0:38
Next, under the count label,
0:51
we need to count how many runners
were the age reported in column A.
0:54
To do this,
we'll use the Count if function,
1:00
which counts all the values in a range,
but only if they meet certain criteria.
1:03
So, under our Count label,
let's type =CountIf and hit Enter.
1:07
And then, we need to specify
the range we are interested in,
1:14
which in this case is
the ages of all the runners.
1:17
We already have that range and
the formulas up here.
1:21
So I'll take a second to copy that.
1:24
And then down here,
looks like I lost my CountIf,
1:32
but we can bring it back,
and paste in the range.
1:36
Then, for the second parameter,
1:40
we need to check that the value is
equal to the number in column A.
1:42
And remember,
this needs to be represented as a string.
1:46
So let's type, in quotation marks,
an equal sign, and then
1:50
let's concatenate that equal sign with the
cell in column A by typing an ampersand.
1:55
And then clicking over here
to select the cell and
2:02
hit Enter to finish our account statement.
2:05
Finally let's the drag list formula
down to page 84 to give us the counts.
2:07
And notice that if we
click on this last count,
2:25
the range of ages is locked
in as C2 to C26,411.
2:30
You can get some pretty hard to
find errors if you forget to lock
2:36
in these rangers by
using the dollar signs.
2:39
So, if you want to be using the exact
same range in a bunch of different cells,
2:42
make sure you're specifying your
range with those dollar signs.
2:47
Now that we've got all our accounts,
lets go back to the top and
2:52
add another column to keep track of
the difference between the two ages.
2:55
Let's call it difference and
3:00
then quickly bold it, and
then let's skip row 7 and
3:04
then row 8, let's type =B8-B7.
3:10
Then let's drag from there all the way
to the bottom to give us all of our
3:14
differences.
3:18
We can already tell that some
of these values are over 200.
3:21
But let's try and make them standout
by using some conditional formatting.
3:26
Conditional formatting is
formatting that only happens
3:30
once a certain condition is met.
3:33
For example, if you wanted to highlight
all the positive cells in green and
3:35
all the negative cells in red,
you could use conditional formatting.
3:40
To add conditional formatting
to a Difference column,
3:43
let's select all of the differences And
3:46
then choose Format,
Conditional formatting.
3:52
On the right we have options for
our formatting.
3:57
Let's change the condition from if
Cell is not empty to if Greater than
4:00
And add a value of 200.
4:10
Finally, let's change the formatting
style to have a red background.
4:13
And there we go.
4:18
It looks like we've got one, two, three,
4:19
four instances where
the difference is more than 200.
4:24
However, we really shouldn't
be hard coding this 200 value.
4:30
It should be set to the value
of our max difference.
4:33
So, instead of 200, let's type =,
and this is cell B4 and
4:38
we're not allowed click it,
which means we have to type it in.
4:44
Well that doesn't look quite right.
4:51
What's going on here?
4:53
Well, it turns out that when we type
B4 like this, it starts with C8 and
4:54
it compares it to B4, but then it moves
on to C9 and it compares it to B5 and
5:01
C10 with B6, and so on.
5:07
If we want this to compare
each cell with B4,
5:12
we need to use dollar signs to
make this an absolute reference.
5:15
That's better.
5:23
Then let's click done, and
5:24
by our criteria, we can now answer
the question we set out to solve.
5:27
Some ages do have an easier
time qualifying than others.
5:32
Though before we move on,
5:36
we should briefly talk about
statistical significance.
5:38
Imagine you just created a new drug.
5:41
Well, before you'd be able to sell that
drug, you'd need to make sure it was safe.
5:44
You'd want to test it on
a small subset of people
5:49
before releasing it to
the whole population.
5:51
But how many people would
you need to test it on?
5:54
If you tried it on five people and
they all turned out fine,
5:57
would you say that it's safe for
the whole population?
6:00
No way, but then,
how many people do you need?
6:03
1,000?
6:07
10,000?
6:08
Either way,
at some point your sample is big enough,
6:10
that you can start to drag conclusions
about the whole population.
6:12
This is what's statistical
significance is all about.
6:17
It tells us if our findings can be safely
extrapolated to the whole population.
6:20
If you'd like to read more
about statistical significance,
6:26
check out the teacher's notes below.
6:28
Coming up we'll look at how we can
present our findings to management.
6:30
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up