Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
Before we can write a classifier, we need something to classify. That is, we need a dataset.
Resources
- Iris flower dataset | Wikipedia
- load_iris() | scikit-learn Documentation
- Treehouse Workshop: Introducing Text Editors
- Which Text Editor Should I Use? | Treehouse Blog
- A Beginnerβs Guide To The Windows Command Line
Python Code
from sklearn.datasets import load_iris
iris = load_iris()
print(list(iris.target_names))
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
Before we can write a classifier
we need something to classify,
0:00
that is we need a data set.
0:03
One of the most classic data sets in all
of machine learning is the Iris data set
0:06
which is a set of 150 examples
of three different types of
0:12
Iris flowers, the Satosa,
Versicolor and Virginica.
0:17
In fact, the iris flower data set
even has its own Wikipedia page,
0:23
to which you can find a link in
the notes associated with this video.
0:28
The Iris flower data set is like
the Hello World program of data sets.
0:32
It's not meant to be used in practical
applications, but it's good for testing
0:38
machine learning techniques, particularly
ones that involve classification.
0:42
If you scroll down to the data set section
and click the show button next to data.
0:47
You can see that this data
set has four features.
0:56
The length and width of each sepal and
the length and width of each petal.
1:00
After these four features there's a label,
1:07
which is the species of the iris flower,
1:13
setosa, versicolor, and virginica.
1:18
Each of these three labels has
50 examples in the data set for
1:23
a total of 150 examples.
1:28
Let's look at another page of
the documentation in Sklearn,
1:30
which you can also find a link to in
the notes associated with this video.
1:36
Sklearn has a number of small datasets
1:40
built in to demonstrate the different
tools available in Sklearn.
1:44
And one of them happens to
be the Iris flower dataset.
1:48
This dataset is too small for
real machine learning analysis but
1:53
it's still useful for testing things
out in this case classification.
1:57
We're going to load this data
set into a python program and
2:02
then make a new example and
try to predict the label.
2:05
First, open your favorite text editor.
2:11
In these lessons, I'm going to use Atom,
which is available on MAC and
2:14
PC, but any plain text
editor should work the same.
2:19
If you're not sure which to use, check
the notes associated with this video.
2:23
First, create a new file if
you haven't already done so.
2:28
And save it as ml.py.
2:31
I already have an ml.py but
I'm just going to save over it.
2:40
The ml stands for
machine learning and py means Python.
2:46
You can actually name file whatever
you would like as long it ends in .py.
2:53
Make sure you remember where your
saving this one on our computer,
2:57
because you need to access it
later from a command line console.
3:02
Now, l am going to start by
importing this Iris dataset,
3:07
so will say from sklearn.datasets and
3:15
then another space.
3:20
I'll type import and then another space,
3:23
and we'll type load_iris.
3:29
The data set isn't quite ready to use yet,
3:36
we have to assign it to a variable
in our code, like this.
3:39
I'll type iris and an equal sign and
3:44
then use the function, load_iris.
3:48
Now we could print the entire data set,
but that's going to look pretty ugly
3:54
on the console and won't really
be all that useful to us anyway.
3:59
Instead, let's just print the labels,
otherwise known as target names,
4:04
just to make sure that we've
loaded the dataset correctly.
4:09
We can do that by using
the print function and
4:13
converting the target names
into a list like this.
4:17
So we'll type print and
some parentheses, and
4:22
inside we'll type list
which is a function.
4:26
And inside the list function,
4:31
we'll use the iris variable that
we created followed by a dot.
4:35
And we'll type target underscore names.
4:41
And that will list and
print out the target names or
4:44
the labels in the Iris dataset.
4:49
Now make sure you've typed everything
carefully and then save the file.
4:53
Now go back to Anaconda Navigator and
4:59
make sure you're in your machine
learning basics environment.
5:04
And click the play button,
and choose Open Terminal.
5:08
We could use the interactive
Python command line, but
5:15
using the terminal will be a little
easier for running files like this.
5:18
If you're on Windows, your terminal will
obviously look different than on a Mac.
5:24
But the general principles
should remain the same.
5:29
Next, you'll need to navigate to
the directory where you stored your file.
5:33
So in my case, I know it's in my home
directory inside my Dropbox folder.
5:38
Under treehouse, courses,
machine learning, basics,
5:47
and so now I've changed to that directory
and I will list out its contents.
5:53
And like I said, this is a little
different on Mac and Windows.
5:59
So if you do need some additional help,
pause this video and check out the notes.
6:03
Once you've navigated to the folder
where your Python file is saved,
6:09
type the word python followed by a space,
6:14
followed by the name of your program,
ml.py and then hit enter.
6:19
You should see the three labels
in the data set, setosa,
6:28
versicolor and virginica.
6:32
If you get an error go
back to your code and
6:35
make sure it's exactly the same as mine.
6:38
It's easy to miss a parentheses or
make a small typo so check carefully.
6:42
If you need help, check out the notes
in this video for the exact code.
6:47
Great, now that we've loaded a dataset,
next, we'll use it to make predictions.
6:53
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up