Heads up! To view this whole video, sign in with your Courses account or enroll in your free 7-day trial. Sign In Enroll
Well done!
      You have completed Preparing Data for Analysis!
      
    
You have completed Preparing Data for Analysis!
Preview
    
      
  Understand what cleaning data means.
Terms
- Data Cleaning is the process of fixing any errors or mistakes in a dataset.
Related Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign upRelated Discussions
Have questions about this video? Start a discussion with the community and Treehouse staff.
Sign up
                      [MUSIC]
                      0:00
                    
                    
                      Hi, my name is Megan, and
I'm a teacher here at Treehouse.
                      0:09
                    
                    
                      In this course, I'll teach you
how to prepare data for analysis.
                      0:13
                    
                    
                      It's unlikely you will always get
a perfect data set without any mistakes or
                      0:18
                    
                    
                      errors or missing information.
                      0:23
                    
                    
                      You will most likely need to clean the
data in order to prepare it for analysis.
                      0:25
                    
                    
                      And analysis is only as
good as your data is.
                      0:31
                    
                    
                      Data cleaning is the process of fixing
any errors or mistakes in a dataset.
                      0:38
                    
                    
                      You've probably seen data in one
of its common forms, like a table.
                      0:45
                    
                    
                      The columns inform you of the type
of data this table contains, and
                      0:49
                    
                    
                      the rows hold data points.
                      0:54
                    
                    
                      For example,
here is a table filled with Pokemon.
                      0:56
                    
                    
                      This table is clean because all of
the data types are the same, and
                      1:01
                    
                    
                      there isn't any missing information.
                      1:06
                    
                    
                      Now let's look at a dirty
version of this table.
                      1:09
                    
                    
                      This table now has instances of missing
data, like in the first row and Ekans row.
                      1:13
                    
                    
                      Data in the incorrect format,
one is written in feet and inches,
                      1:19
                    
                    
                      another includes the notation for
pounds and
                      1:24
                    
                    
                      is also a whole number instead
of a decimal or float.
                      1:27
                    
                    
                      And lastly, this one's items are broken
up by dashes instead of commas.
                      1:31
                    
                    
                      While cleaning data, you'll need to make
decisions about whether to discard rows,
                      1:36
                    
                    
                      which format to use for
the column, and more.
                      1:41
                    
                    
                      Depending on the amount of cleaning you
need to do, these decisions will be
                      1:44
                    
                    
                      important to share with stakeholders
when you share your analysis.
                      1:49
                    
                    
                      We'll be working with spreadsheets
using Google Sheets and
                      1:53
                    
                    
                      then Python's pandas library.
                      1:56
                    
                    
                      If you aren't familiar with either topic,
                      1:58
                    
                    
                      I would suggest taking the prerequisites
for this course before continuing.
                      2:01
                    
                    
                      Throughout the course, don't forget to
check the teacher's notes below each video
                      2:05
                    
                    
                      for additional information.
                      2:10
                    
                    
                      Let's dig in.
                      2:11
                    
              
        You need to sign up for Treehouse in order to download course files.
Sign upYou need to sign up for Treehouse in order to set up Workspace
Sign up