Quick Python Tips to Explore Data

Tej Narayan
The Startup
Published in
5 min readAug 15, 2020

--

Simplest and quickest ways to do Exploratory Data Analysis with Pandas

“Sometimes not much is just enough.” ― John O’Callaghan

Snapshot by Celeb Jones-unsplash

I am new to python and used to struggle often with minor data exploratory commands, there are numerous ways to explore datasets and do basic calculations, minor editing to research about data.

Initially, when we are new to python we come across several errors for simple commands, however these can overcome by simple user friendly tips and tricks. I came across some of these and thought of sharing with larger group, especially with new python users. I hope they like it.

I have taken Nobel Laureate dataset from Kaggle to do this exploratory data analysis, you can find the dataset here.

First of all import Pandas library and read data:

Now, let’s check the details about this dataset, this is bit old dataset as last couple of years winners name not included.

How do we get the number of rows or columns, here are some of the quickest ways:

len(df)

len(df.columns)

df.columns

Similarly, if we want to check the complete stats about the dataset:

How to check few rows and columns:

df.head()

df.tail()

Another simplest way, if we want to get few particular rows:

df[2:6]

df[:]

Suppose, we want every 10th row from the dataframe, how do we get that:

every_10 = df[::10]

There’s a another way to get the 10th row:

df[::10].head()

How to get the data in reverse order, how can we get that without disturbing the original data, further we can save it as a new dataframe for more analysis.

df[::-1]

Suppose I want particular category, for example I want all the Categories of Nobel Prizes:

Similarly, want two of more columns:

How do we combine particular columns and few rows:

df[8:12][[“Category”, “Sex”]]

Similarly, if we want a subset of dataframe with few rows and particular columns:

df[[“Category”, “Year”, “Birth Country”]][4:8]

How do we fetch a particular item or key from a dataframe:

df.iat[3,4]

df.at[10, “Laureate ID”]

df.at[2, “Motivation”]

Suppose we want to new dataframe with a single category of data, for example, I want to know how many people won Nobel prize from Switzerland :

df1 = df[df[“Birth Country”] == “Switzerland”]

And suppose I want to filter multiple rows, for example I want a list of Category with winners from three different categories like ‘Peace’, ‘Medicine’ and ‘Economics’.

df2 = df([df[“Category”].isin([“Peace”, “Medicine”, “Economics”])]

We can also do some mathematical calculations, although this dataset is not suitable for that, however let’s try with minimal available numerical columns:

Now, suppose we want to check the correlation within dataset:

Now, if we want to sort our dataset with certain criteria, for example I want data based on Category of winners:

df.sort_values(by=[“Category”], ascending = False)

And if, I want some more granular filtering:

df[df[“Category”] == (“Peace”)]

Now, let’s try to do some mathematical calculations, for example, I want to calculate the age of all the winners, from the same dataset. We have only option is to subtract Death Year with Birth Year, however couple of challenges in this, as Death Year is an object field, we need to convert it to float first. Than we need to rename the column as it will take single word.

Here’s how we can do that in same dataframe:

Created a new column ‘Age’ and calculated it:

df[“Age”] = df.DeathYear — df.BirthYear

Now the finally the dataset with age of winner’s will look like this:

df[[“Category”, “Year”, “Full Name”, “Age”]].head()

These are some of the tips for easy and quick exploration of dataset, as I mentioned in the beginning there are various other ways to do that it only requires regular practice.

Conclusion:

To summarize, these are some of the easiest ways to do data exploratory analysis, quite often we used to struggle or take help of google, stackoverflow for basic python syntax. I hope this will help the python novice users to remember these quick tips and help them in their regular data analysis.

Thanks for reading, let me know if you have any other shortcut tips, tricks which can help in data analysis. You can find the complete EDA at Kaggle. please share your feedback at tej_on@outlook.com.

--

--

Tej Narayan
The Startup

Data Scientist, Passion writing, Data Visualization, Story telling.