Beginner’s Guide to Data Analysis with Pandas and Matplotlib
Your first practical guide to data analysis and visualization

Data is everywhere, from personal expenses to global climate records. However, data itself has little value until it is analyzed and visualized to uncover insights.
In this guide, we will walk through a beginner-friendly data analysis project using Python, Pandas, and Matplotlib. We will work with the well-known Iris dataset, which contains measurements of different Iris flower species.
By the end, you will understand how to:
Load and clean a dataset
Explore and analyze data with Pandas
Create visualizations with Matplotlib and Seaborn
Document results clearly
Prerequisites
Before you start, ensure you have:
A basic understanding of Python (variables, functions, lists)
A Python environment (Anaconda, Jupyter Notebook, or Google Colab)
Installed the following libraries:
pip install pandas matplotlib seaborn scikit-learn
If you prefer not to install anything locally, you can use Google Colab which runs entirely in the browser.
Step 1: Setting Up Your Notebook
Open Google Colab.
Create a new notebook.
Rename it
Data_Analysis_Assignment.Copy and paste the code snippets provided in the following sections.
Step 2: Loading the Dataset
We will use the Iris dataset. It is available in scikit-learn, but we will also handle cases where a CSV file is missing.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
# Try to load dataset
try:
df = pd.read_csv("iris.csv")
print("Loaded iris.csv from local file.")
except FileNotFoundError:
print("iris.csv not found. Loading from sklearn instead...")
iris = load_iris(as_frame=True)
df = iris.frame
# Preview dataset
df.head()
Expected result: A table showing the first five rows of the dataset.
Step 3: Exploring the Dataset
Check the structure, data types, and missing values.
# Info about dataset
df.info()
# Check missing values
print(df.isnull().sum())
Expected result: A summary of the dataset structure and confirmation that there are no missing values.
Step 4: Basic Data Analysis
Generate statistics and group data by species.
# Descriptive statistics
df.describe()
# Group by species and calculate mean
grouped = df.groupby("target").mean()
print(grouped)
Observation:
Setosa has the smallest petal and sepal sizes
Virginica has the largest
Versicolor lies in between
Step 5: Data Visualization
We will now create different plots using Matplotlib and Seaborn.
Line Chart – Sepal Length Trend
plt.plot(df["sepal length (cm)"])
plt.title("Sepal Length Trend")
plt.xlabel("Sample")
plt.ylabel("Sepal Length (cm)")
plt.show()

Fig. 1.0 Line chart
Bar Chart – Average Petal Length by Species
sns.barplot(x="target", y="petal length (cm)", data=df, ci=None)
plt.title("Average Petal Length by Species")
plt.xlabel("Species")
plt.ylabel("Petal Length (cm)")
plt.show()

Fig. 1.1 Bar chart
Histogram – Sepal Width Distribution
plt.hist(df["sepal width (cm)"], bins=20, color="skyblue", edgecolor="black")
plt.title("Distribution of Sepal Width")
plt.xlabel("Sepal Width (cm)")
plt.ylabel("Frequency")
plt.show()

Fig. 1.2 Histogram
Scatter Plot – Sepal vs Petal Length
sns.scatterplot(
x="sepal length (cm)",
y="petal length (cm)",
hue="target",
data=df
)
plt.title("Sepal Length vs Petal Length")
plt.show()

Fig. 1.1 Scatter Plot
Each visualization will appear below the corresponding cell.
Step 6: Error Handling
When loading datasets, files may be missing or misformatted. We use a try-except block to handle this gracefully.
try:
df = pd.read_csv("iris.csv")
except FileNotFoundError:
iris = load_iris(as_frame=True)
df = iris.frame
This ensures that the notebook continues running even if the CSV file is unavailable.
Step 7: Observations and Findings
From the analysis and visualizations:
Species are clearly separable by petal and sepal measurements
Sepal width follows a normal-like distribution
Scatter plots show distinct clustering by species
Step 8: Conclusion
In this project, we:
Learned how to load and clean datasets with Pandas
Performed descriptive statistics
Created multiple visualizations with Matplotlib and Seaborn
Implemented basic error handling
The Iris dataset clearly demonstrates how data analysis workflows can uncover meaningful patterns. This serves as a strong starting point for learning data analysis with Python.


