Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
Data isn't always distributed the way you want. In this video we'll talk about a few of the different ways we can measure the spread of our data.
This video doesn't have any notes.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
We've got the extremes of our data and
we've got the middle.
0:00
But how is our data distributed?
0:03
One common way to describe the spread of
our data is to use the standard deviation
0:05
which is commonly represented
as the Greek letter sigma.
0:10
The standard deviation aims to tell us how
far away our data is from the average.
0:13
To calculate it,
0:18
we start by taking the difference
between each value and the average.
0:19
Then we square each of those values,
add them up, and
0:23
divide by the total number of values.
0:26
This gives us the standard deviation
squared which is also called the variance.
0:29
So to get this standard deviation, we just
take the square root and there we go.
0:34
We've got a standard deviation of 64.29,
so if we were to put this on a graph,
0:39
we'd put the average in the middle and
then go 64.29 above and below the average.
0:44
Then we can say that any data in this
range is within one standard deviation
0:50
of the average.
0:55
So that's a pretty big range.
0:56
Let's see what happens if instead of
a perfect game, our first bowler,
0:58
bowls a 135.
1:03
Now, instead of an average of 134.5,
we've got an average of about 114 and
1:04
our standard deviation is
all the way down to just 17.
1:10
So if we make a plot of this new standard
deviation, we can see that this data
1:14
is much more clustered together than
when it included a perfect game.
1:19
Let's calculate the standard deviation for
the finishing times.
1:23
First, let's add a new label for
Standard Deviation in row nine.
1:27
And let's make it bold and
1:36
then double-click right here to
automatically set the width of the column.
1:38
Then, in the cell next to it, let's type
=STDEV and hit Enter to select a function.
1:45
Then let's paste in the range and
hit Enter again and
1:54
it looks like we've got
a Standard Deviation of about 42 minutes.
1:58
Also, if you're not seeing 42 minutes
here, you can come over here and
2:02
change the data type to Duration and
that should fix your issue.
2:07
So most racers finished within 42
minutes of the average finish time.
2:12
But standard deviation
doesn't tell the whole story,
2:17
it only tells us how compact or
spread out our data is.
2:21
To get the rest of the picture,
we need to talk about skew.
2:25
Skew is when your data seems to
favor one side over the other.
2:29
Most of the data is either to the right or
left of the middle.
2:34
And depending on which
side has the long tail,
2:37
you would say that this data is either
skewed negatively or positively.
2:40
An easy way to remember skew
directions is to start at the peak and
2:45
draw an arrow towards the long tail.
2:49
The direction that arrow points
is how the data is skewed.
2:52
So this data has a negative skew.
2:56
On the other hand, if your data has
no skew and its mean, median, and
2:59
mode are all right in the middle,
then your data is said to have
3:04
a normal distribution which is
frequently referred to as a bell curve.
3:08
Normal distributions have many
convenient properties and
3:13
they occur fairly frequently in real life.
3:16
People's heights, test scores, and
3:19
even blood pressures are all
normally distributed.
3:21
One property of normal distributions
is how many values occur within a given
3:25
standard diviation of the mean.
3:29
68% of the data should be contained
within 1 standard deviation,
3:30
95% should be contained within 2.
3:35
And if you go out to 3
standard deviations at 99.7%,
3:39
that should be pretty
much all of the data.
3:44
Let's see if our data is normally
distributed by seeing how
3:46
close we come to these
numbers in the next video.
3:49
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up