Welcome to the Treehouse Community
Want to collaborate on code errors? Have bugs you need feedback on? Looking for an extra set of eyes on your latest project? Get support with fellow developers, designers, and programmers of all backgrounds and skill levels here with the Treehouse Community! While you're at it, check out some resources Treehouse students have shared here.
Looking to learn something new?
Treehouse offers a seven day free trial for new students. Get access to thousands of hours of content and join thousands of Treehouse students and alumni in the community today.
Start your free trialfrankgenova
Python Web Development Techdegree Student 15,616 PointsHow much data is missing from each row - what does axis = 1 mean?
In the notebook, there is a section called "How much data is missing from each row". The instructor uses the following code:
missing_data = np.sum(demo.isnull(), axis=1)
Per documentation, it looks like axis=None is the default. I'm not clear what the axis parameter does and why axis=1 was chosen.
1 Answer
Alex Koumparos
Python Development Techdegree Student 36,887 PointsHi frankgenova
The axis
value represents the dimension of a multidimensional array.
In the case of this dataframe, we have two dimensions: columns and rows. Axis 0 (or None) refers to the columns, Axis 1 refers to the rows.
Consider this simplified version of the dataset:
. | ID | Age | Gender | Military |
---|---|---|---|---|
0 | 1 | 2.0 | 2.0 | NaN |
1 | 2 | 77.0 | 1.0 | 1.0 |
2 | 3 | 95.0 | 2.0 | NaN |
3 | 4 | 1.0 | 1.0 | NaN |
4 | 5 | 49.0 | 1.0 | 1.0 |
Thus if we sum on axis 0/None, we see the number of null entries in each column:
ID 0
Age 0
Gender 0
Military 3
Citizenship 0
dtype: int64
Versus summing on axis 1, we see the number of null entries for each row:
0 1
1 0
2 1
3 1
4 0
dtype: int64
Hope that clears things up for you.
Cheers,
Alex