Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
Before we continue, we should formally define some of the terms I've been using to describe machine learning, and then break them down further with more examples.
Vocabulary and Definitions
- Example: A single element in a dataset
- Feature: One characteristic of an example
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
[MUSIC]
0:00
Toward the end of these lessons,
we're going to Python and
0:05
the scikit-learn project to
write our own classifier.
0:08
But before we continue, we should formally
define some of the terms I've been using
0:12
to describe machine learning and
0:17
then break them down
further with more examples.
0:19
Speaking of examples, an example
is a single element in a dataset.
0:23
Sometimes you might hear an example
referred to as a sample,
0:29
but it means the same thing.
0:35
If your data is formatted in a table,
0:37
an example might be
a single row in the table.
0:40
A dataset is comprised on many examples.
0:45
And in general,
0:48
each example helps improve the confidence
of your model's predictions.
0:49
Say for instance, you're running a movie
studio and you want to try an forecast
0:55
how much money a movie might make,
so that you can set a budget.
1:00
Your dataset would probably
be examples of older movies.
1:04
So what about those older
movies might you include?
1:09
Each part of an example
is called a feature.
1:13
A feature is one
characteristic of an example.
1:17
Again, if you formatted
your data in a table,
1:22
each feature might be a single column.
1:25
In the case of predicting a movie's box
office performance, your older examples of
1:29
movies might include things like
their total box office sales.
1:34
The budget, the genre, release date and
1:38
maybe more advanced features,
like a star power calculation.
1:41
Which could take all the actors in each
movie and calculate a weighted average of
1:46
their typical box office performance
in other movies they've been in.
1:50
A dataset might contain good and
bad features.
1:56
And some features that are more
important than others.
2:00
For example,
you might find that the genre and
2:04
release date is more
important than the budget.
2:06
So your model could weigh
those features more heavily.
2:10
A feature that might be completely
irrelevant is the movie's title.
2:14
Sure a movie needs a title and you might
be able to come up with a machine learning
2:19
model that can determine what
makes a good and bad movie title.
2:23
But in most cases,
it's probably too subjective and
2:28
inconsequential to weigh it against
other more quantifiable features.
2:31
Something like the box office performance
of a movie is very difficult to predict.
2:37
And it includes a huge number
of factors that are nearly
2:42
impossible to simulate perfectly.
2:45
But that's why a model is
nothing more than that.
2:48
A model or
a simplification of the problem.
2:51
It's just one tool that can be used
in combination with other approaches
2:55
to arrive at a solution.
3:00
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up