Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
Before you start an analysis, you'll first want to define the question you're trying to answer.
This video doesn't have any notes.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
Management typically starts
the data analysis process
0:00
when they need to know something.
0:03
Let's work through this process by
pretending the boss of an athletic
0:05
association has received complaints that
some ages have an easier time qualifying
0:08
than others and they've tasked us
with getting to the bottom of it.
0:12
The first we'll need to do
is define the question.
0:16
Let's go with, do some ages have an easier
time qualifying for the Boston Marathon?
0:19
Awesome.
0:25
Next, we need to turn that
question into something concrete,
0:26
something we'll be able
to answer with our data.
0:29
One way to find out if some ages
have an easier time qualifying
0:32
is to compare the number of
participants for each age.
0:36
If we find a big difference
between two consecutive ages,
0:39
then we'll know something is up.
0:43
But this provides us with another issue.
0:44
How do we define a big difference?
0:47
Remember, we're trying to answer a yes or
no question.
0:50
So at some point,
we need to draw a line in the sand and
0:53
say, this difference is too much.
0:56
Analyzing data is all
about asking questions.
0:58
You wanna approach each step and decision
along the way with an inquisitive mind.
1:02
Always asking if things
need clarification, or
1:06
could be better in anyway.
1:09
So, in this example, we're asking
what's an appropriate difference.
1:11
To figure out the answer,
let's go back to the spreadsheet.
1:15
And let's start off by creating a new tab
at the bottom and naming it Age Breakdown.
1:19
Then, to figure out where
we should draw that line,
1:29
let's first find out how many
ages took part in the race.
1:32
In Column A, let's add labels for
youngest and oldest.
1:35
Then, in Column B1,
let's set it equal to men and
1:42
then let's head over to the 2017 tab and
select all the age data.
1:47
By clicking in cell C2 and
using control, shift, down or
1:59
command, shift, down, then hit enter.
2:03
And there we go.
2:07
Now let's clean up that formula by using
F4 to make those references absolute.
2:09
Then we can drag that down to oldest and
replace min with max.
2:19
Perfect.
2:28
Next, to give us some idea of how
big is too big, let's figure out
2:30
how many runners of each age there would
be if ages were uniformly distributed.
2:35
[SOUND] So, if each age have
the same number of runners,
2:40
how many runners would that be?
2:43
Now this is almost certainly not the case,
2:46
but it's easy to calculate and
gives us a good jumping off point.
2:48
Below oldest, let's add a new
label called runners per age.
2:52
And let's make this column
just a little bit wider.
2:58
And let's say that equal
to the total number
3:03
of runners from the summary
tab divided by and
3:08
parenthesis oldest which I'll
just write in B2 minus youngest
3:14
End the parentheses and hit Enter, great.
3:24
So in a uniform distribution, each age
would account for about 400 runners.
3:27
From here, we just need to use this
figure to decide how much of a difference
3:33
is acceptable between
two consecutive ages.
3:37
I think 400 is probably too high and
100 is probably too low.
3:40
But between those two, it's difficult
to say where we should end up.
3:48
We can only do so
much in trying to figure things out.
3:51
At some point we just
have to pick something.
3:55
So, let's go with 200, or
about half of our runners per age.
3:57
Let's add a new label below runners
per age called max difference.
4:02
And let's set it equal to
runners per age divided by two.
4:11
Let's also format these two cells to have
less decimal points by clicking on this
4:18
button up here.
4:23
And let's also bold our labels
to make them easier to read.
4:28
In the next video, we'll dive into
the aged data and see what we find.
4:34
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up