AI can only take you so far. 🌟 Start with core skills in JavaScript, HTML, CSS, or Python. 🚀

Join the Treehouse affiliate program and earn 25% recurring commission!

New No-Code Track! 🚀start learning today!

🌟 Dreaming of a bright future? 🎓 Ask about the Treehouse Scholarship program! 🚀

✨ Earn college credits in Cybersecurity, JS, HTML, CSS and Python

Well done!

You have completed Analyzing Books with Pandas!

Sign up for Treehouse Back to Library

Preview

Sign up for Treehouse Continue

Popularity

10:44 with Megan Amendola

Dig into the books dataset to determine the most popular book.

Teacher's Notes
Questions?
Video Transcript
Downloads
Workspaces

What is the most popular book of the 1960's?

Use pd.Timestamp to compare dates. For example:

books['publication_date'] > pd.Timestamp(1960,1,1)

Related Discussions

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up

Related Discussions

Have questions about this video? Start a discussion with the community and Treehouse staff.

Sign up

We're going to get into analyzing book popularity. 0:00

I'm gonna start off with a couple of questions I have and 0:03

I'll add these to my notebook as markdown cells, markdown. 0:06

First, what is the most popular book? 0:13

And then second, Are books 0:20

with fewer pages rated higher than 0:26

those with large page counts? 0:31

Now that we've got our initial questions, 0:38

let's start digging into our data to find the answers. 0:40

If I scroll up a bit, I can see that we 0:43

have average ratings for our books. 0:47

This shows us how users on Goodreads have rated each book on a scale of one for 0:54

the worst and five for the best. 0:58

This one seems pretty easy to see what the max value is currently in the rating 1:01

column. 1:06

So let's do add a cell here. 1:07

Books where the and we're gonna get the average rating and 1:12

we're gonna get the max value and we get back a five. 1:19

So the highest rating in the database is a five. 1:26

Out of curiosity, let's do a quick min. 1:30

And unsurprisingly it's a zero. 1:34

So we need to see what the five star rated book or books are. 1:37

So let's change this up. 1:44

Let's do books.loc, L-O-C, 1:46

where we're looking for the books, 1:51

average rating is equal to a 5.0. 1:56

Looks like we get a few books with five star ratings. 2:03

You may think this question is complete, but 2:10

on further look at the data I see that there's also a ratings count column. 2:13

This says how many people have rated a book. 2:20

This is important to add to our analysis because what if one person 2:23

writes a book as a 5, but 5,000 people rated another book and 2:28

it's at a 4.7, which is actually more popular? 2:33

The first book here has zero ratings, so is it really a popular book? 2:38

I think we should go solely based on the number of reviews to show many people 2:44

at least read a book, and then use the ratings as a secondary ranking. 2:49

In my head this makes more sense. 2:54

If a book is popular, it's probably going to have many reviews. 2:56

And then if it's a good book, it should have a high rating. 3:00

Let's make a note here, so we don't lose our thoughts. 3:04

Great, now let's fix our code. 3:28

Let's sort to see the books with a high number of ratings. 3:30

I think this is a good place to start. 3:34

Let's look at our data now. 3:49

It looks like we have some Harry Potter books and 3:51

looks like some His Dark Materials, and quite a few others. 3:54

I think we can agree that these are names you may recognize more compared to our 4:02

previous results. 4:06

So I think we're getting somewhere. 4:08

I think there's another layer needed here though. 4:10

Our top book has a rating of 4.57. 4:13

But there may be others that are rated higher than the one that 4:16

we've currently found like this one that's rated 4.78. 4:20

I think we need to specify a rating conditional as in 4:25

a rating should be above a 4.0, or maybe even a 4.5. 4:29

Let's try both to see what we get. 4:35

I'm gonna save this as a variable. 4:37

And then, Where popularity, 4:43

Average rating, Is greater than a 4.0. 4:53

So our results look very similar, 5:02

we still have a book lower down that's rated higher. 5:05

So let's also sort our values by the average rating just to make sure we end up 5:10

with the highest rated books at the top. 5:15

I'm gonna set our same variable 5:18

equal to our new filter. 5:23

And then I'm gonna do popularity.sort 5:29

values by, Average rating. 5:36

Ascending, Equals false. 5:44

Now with all that put together I have The Complete Calvin and 5:49

Hobbes by Bill Watterson has an average rating of 4.82 and 5:54

has over 32,000 readings. 5:58

I think we can call that a popular book. 6:01

And just to make it super clear I'm gonna add a slice here 6:04

at the end just to get one result. 6:08

There we go. 6:13

That way, we just don't have a whole bunch of rows there. 6:14

We only need the first one. 6:16

On to the next question. 6:17

Are books with fewer pages rated higher than those with large page counts? 6:21

This one is a comparison to see if there's a correlation between the number of pages 6:26

and a book's rating. 6:30

We can filter again to see the books with low page numbers. 6:32

So let's do books where the books 6:36

num pages is let's say less than 300. 6:41

We also need to organize the books by rating count to make sure we're getting 6:50

books that have a good amount of ratings to support their score. 6:55

So I'm gonna set this as few pages. 6:59

And let's do few pages where 7:05

few pages, ratings count. 7:10

And we'll do the same as we've been doing before, greater than 1,000, okay? 7:16

And then lastly, let's make sure we sort by the average ratings so 7:24

we can see the best one of the bunch. 7:28

So I'm gonna set this equal to the variable again. 7:32

So it now contains both of our filters and 7:35

then we can do few pages.sort 7:42

values by the average rating. 7:47

We want ascending equals false. 7:52

And it looks like we got It's a Magical World which is Calvin and 7:58

Hobbes number 11. 8:03

Rated a 4.76. 8:04

It has 176 pages and 23,000 ratings. 8:06

Same thing here I'm gonna add a slice so 8:08

we just get the first one. 8:14

Just to clear up our notebook a bit. 8:19

Cool, now we need to do the opposite. 8:22

We can do this by modifying the first line. 8:24

So, I'm just gonna copy all of this, I'm gonna paste it and then, 8:27

just to be clear with our, Variable names, I'm gonna switch this to be many, 8:33

And I think that's all with that, cool. 8:52

And if I run it, Oops, we got the same one, I forgot to change this. 8:54

[LAUGH] This will probably be helpful. 9:01

So we had less than 300 for a few pages. 9:03

Let's do greater than 300 for most pages, and we run it and Calvin and Hobbes again. 9:06

It's our same popular book that we got previously. 9:12

With an average rating of 4.82, so between our two books, 9:16

we don't have much of a difference in the overall rating. 9:20

Between point seven, six and point eight two, it's what 6. 0.06 between the two. 9:24

That's not a lot. 9:29

Let's add a note in here. 9:33

There isn't a large difference between the book ratings. 9:37

Only 0.06 between the top, 9:49

In each category. 9:56

Now while we don't really see a difference between these two numbers, a chart might 9:59

better show if there is a correlation and may just give us a better visualization. 10:03

We won't get into charting in this workshop, but 10:08

it's a good thing to note in your analysis for future improvements. 10:11

As a challenge in the teachers notes below, 10:22

see if you can find the most popular book of the 1960s. 10:27

There's some hints in the teachers notes to help you out. 10:35

Nice work Pythonistas, you've done a ton of code so far. 10:38

Keep it up. 10:42

You need to sign up for Treehouse in order to download course files.

Sign up

You need to sign up for Treehouse in order to set up Workspace

Sign up