Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
Forms are a big part of many websites. Scrapy provides a FormRequest class for handling them.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
[MUSIC]
0:00
We've managed to make a couple
of spiders that were great for
0:04
sites that don't require interaction.
0:07
But many sites do indeed require
some sort of interaction.
0:10
For example,
logging in to a site with a username and
0:13
password requires a form submission.
0:17
There are many different reasons for
0:20
needing to work with forms when
getting and scraping data.
0:21
Let's head back into our code to
take a look at some techniques.
0:25
Our Horse Land site is
hosted on GitHub pages,
0:30
which doesn't support
backend technologies.
0:34
So we'll be using a bit of a workaround
from Formspree to handle the form posts.
0:37
Check the teacher's notes for additional
information about formspree.io and
0:43
how to get started with that.
0:47
If we'd look at our form page,
we see that it's a pretty simple form
0:48
with just a first name,
last name, and a job title.
0:53
Scrapy has a class called form request,
which allows for form processing.
0:57
And, hold your horses, it's easy to use.
1:02
Let's mosey on over to our code and
create a new spider.
1:05
So I'll create a new file, gonna be a
Python file, and we'll call it formSpider.
1:10
The first form request
will need to be imported.
1:18
So from scrapy.http import FormRequest.
1:21
And we need to import spider.
1:28
Scrapy.spiders.
1:29
import Spider.
1:32
We need to create a new class that
inherits from Spider as our next step.
1:37
Call it FormSpider and, as we've seen,
we need to give our Spider a name.
1:43
We'll just call it horseForm.
1:51
And we define our start URL.
1:55
Which again, is a list.
2:00
What's the URL for our form?
2:02
We'll just cut and paste that in.
2:06
This looks pretty familiar this far,
I think.
2:11
Next we define our parse method and we'll
define the formdata we want to pass in.
2:14
So define parse and formdata.
2:21
Let's go use the developer tools in
the browser to see what the form
2:25
fields are called.
2:28
Come over here, Developer Tools.
2:30
So they're down in here in this form.
2:36
So we have firstname,
Lastname, And jobtitle.
2:46
All lower case and no spaces.
2:54
So we want firstname.
2:58
My first name is Ken.
3:00
Lastname, Alger.
3:04
And jobtitle is Teacher.
3:09
Now we need to return a form
request from response object.
3:15
So return FormRequest.from_response.
3:19
We'll return the response, the form
number on the page we're processing,
3:26
and that's zero based, formnumber,
and then the form data we want.
3:33
So formdata = formdata.
3:40
And then a callback for what to do next.
3:45
So callback.
3:48
We'll make a method
here called after_post.
3:51
This passes the data we
defined into the form and,
3:55
by default, utilizes the submit
button to submit our data.
3:59
Then it will do whatever we
define in the after_post method.
4:04
Here we could do data saving or
data processing or further scraping tasks.
4:08
For now, let's just print out
that the form was processed and
4:14
the response object itself.
4:19
So we'll define after_post, self,
and again, that takes a response.
4:21
We'll print and we'll do
4:26
a little formatting, just so
4:31
we can see it in the terminal.
4:36
And we'll print the response.
4:41
Let's just copy this line here.
4:45
There we go.
4:50
And we can, all right,
let's open a Terminal window,
4:51
Go to our Spiders folder, And
have Scrapy run our crawler.
5:00
We look up here.
5:13
Great, we see that the spider found and
submitted our form.
5:16
In our case here, it was posted
to formspree.io for processing.
5:19
Here's our printed information and
our 200 response code.
5:24
Great, I've included links
in the teacher's notes
5:27
about form request as well.
5:30
I'd encourage you to look at it
as it is a powerful tool for
5:32
processing forms and
can even be used to handle login forms.
5:35
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up