Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Preview
Start a free Courses trial
to watch this video
A high-level overview of the world of data scraping in Python. What it is and isn't and how it can be used.
This video doesn't have any notes.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
[MUSIC]
0:00
Howdy, I'm Ken, and
I'm chomping at the bit to introduce you
0:09
the wild world of data wrangling and
specifically, web scraping.
0:13
We'll be taking an eye-level look at how
to automate data gathering from the web.
0:18
Of course, we'll have some hands-on
practice along the way as well.
0:23
What exactly is web scraping?
0:27
The definition I like best is that it's
the automated collecting of data from
0:29
the web by any means other than
a program interacting with an API.
0:34
This typically is done through
writing a program that requests data
0:38
from a web server and obtains the necessary information.
0:42
Let's take a look at some
examples of use cases,
0:46
before we get too deep into the how-tos.
0:48
Real estate listing companies and agents
use web scraping to scour the web for
0:51
current property listings to gather
competitive data about pricing and
0:55
housing market trends.
1:00
Many companies use web scraping for
competitive research.
1:01
Scraping competitors website's for
product and review information.
1:05
Social media companies scrape the web to
get a better handle on what is trending.
1:09
You can use a web scraper to look at
YouTube or a specific category of videos
1:14
with lots of views to determine what
topics and titles are doing well.
1:18
The potential for options is really
limited only by your imagination.
1:22
Throughout this course we'll build
our scraping skills using the Python
1:27
packages Beautiful Soup and Scrapy.
1:31
We'll look at parsing HTML files, and
writing spiders that will follow the links
1:33
between pages and sites to further
increase our data gathering abilities.
1:38
I'll also touch on how to handle sites
that require logins, and how to test for
1:43
scraping applications.
1:47
Along the way we'll also talk about when
we should reign in our powers to be good
1:49
Internet citizens.
1:53
Before we get started,
let me briefly talk about our tools.
1:55
I'll be using the PyCharm IDE
throughout this course.
1:59
The code samples that we'll be working
through should work perfectly fine
2:03
in other IDEs as well, and
in further editions of PyCharm itself too.
2:07
The developers of PyCharm
are always improving that tool.
2:12
If you run into any problems
using a different version,
2:16
ask for help in the Treehouse forum.
2:19
Also, if any minor changes or bugs pop-up,
2:21
keep your eyes on the teacher's notes for
helpful comments.
2:24
If you spot an issue or difference
somewhere, check the notes first and
2:28
then let us know in the forum
if we've missed it.
2:31
One last thing, remember that the
Treehouse video player has speed controls.
2:35
So if I'm talking too fast,
or going really slow,
2:40
feel free to adjust the speed.
2:43
I won't mind, really, even if you laugh
at how I sound in super slow-motion.
2:44
Okay, let's get started with building
our web scraping skills and knowledge.
2:50
You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up