Visualizing Geographic Data with Matplotlib and Pandas

Have you ever looked at a map and wondered about the hidden patterns in data related to different locations? Maybe you want to see where certain events happen most often, or how a specific value changes across a region. This is where visualizing geographic data comes in handy! It allows us to turn raw numbers into insightful maps, helping us understand our world better.

In this blog post, we’re going to explore how to visualize geographic data using two incredibly popular Python libraries: Pandas and Matplotlib. Don’t worry if you’re new to these; we’ll break down everything into simple steps.

What is Geographic Data?

Before we dive into coding, let’s quickly understand what “geographic data” means. Simply put, it’s any data that has a connection to a specific location on Earth. This location is usually defined by coordinates.

  • Latitude: This tells you how far north or south a point is from the Equator. Imagine horizontal lines running around the Earth.
  • Longitude: This tells you how far east or west a point is from the Prime Meridian. Imagine vertical lines running from pole to pole.

Together, latitude and longitude give us a precise address for any spot on the globe. Examples of geographic data include the location of cities, earthquake epicenters, weather stations, or even the address where a package was delivered.

Why Matplotlib and Pandas?

These two libraries are a fantastic combination for many data science tasks, including geographic visualization:

  • Pandas: This library is a powerhouse for handling and analyzing tabular data (data organized in rows and columns, much like a spreadsheet). It allows us to load, clean, organize, and prepare our geographic data efficiently.
    • Supplementary Explanation: Pandas DataFrame: Think of a Pandas DataFrame as a smart spreadsheet or a table. It’s excellent for storing data where each column has a name (like ‘City’, ‘Latitude’, ‘Longitude’) and each row represents a distinct record.
  • Matplotlib: This is a fundamental plotting library in Python. While it’s general-purpose, it’s highly customizable and can be used to create all sorts of static, animated, and interactive visualizations. We’ll use it to draw our maps!
    • Supplementary Explanation: Matplotlib Plotting Library: This is like a versatile drawing toolkit for Python. It provides functions to create various types of charts and graphs, from simple line plots to complex 3D visualizations.

Getting Started: Installation

First things first, you need to make sure you have Python installed on your computer. If you do, you can install Pandas and Matplotlib using pip, Python’s package installer. Open your terminal or command prompt and run these commands:

pip install pandas matplotlib

This will download and install both libraries, making them ready for use in your Python projects.

Preparing Our Data

For our example, let’s imagine we have a simple dataset of a few major cities, including their latitude, longitude, and population. In a real-world scenario, you might load this data from a CSV file, an Excel spreadsheet, or a database. For simplicity, we’ll create a Pandas DataFrame directly in our code.

Let’s define our data:

import pandas as pd
import matplotlib.pyplot as plt

data = {
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia', 'San Antonio'],
    'Latitude': [40.7128, 34.0522, 41.8781, 29.7604, 33.4484, 39.9526, 29.4241],
    'Longitude': [-74.0060, -118.2437, -87.6298, -95.3698, -112.0740, -75.1652, -98.4936],
    'Population_Millions': [8.4, 3.9, 2.7, 2.3, 1.6, 1.5, 1.5]
}
df = pd.DataFrame(data)

print("Our Data:")
print(df)

Output of print(df):

Our Data:
          City  Latitude  Longitude  Population_Millions
0     New York   40.7128   -74.0060                  8.4
1  Los Angeles   34.0522  -118.2437                  3.9
2      Chicago   41.8781   -87.6298                  2.7
3      Houston   29.7604   -95.3698                  2.3
4      Phoenix   33.4484  -112.0740                  1.6
5 Philadelphia   39.9526   -75.1652                  1.5
6  San Antonio   29.4241   -98.4936                  1.5

Now we have our df DataFrame, which contains all the information we need for plotting.

Basic Geographic Visualization

The simplest way to visualize geographic data is to use a scatter plot. We’ll plot longitude on the x-axis and latitude on the y-axis.

1. Creating a Simple Scatter Plot

Let’s start by plotting just the city locations:

plt.figure(figsize=(10, 8)) # figsize sets the width and height of the plot in inches

plt.scatter(df['Longitude'], df['Latitude'])

plt.xlabel('Longitude')
plt.ylabel('Latitude')

plt.title('Major US Cities: Basic Scatter Plot')

plt.grid(True)

plt.show()

When you run this code, a window will pop up showing a scatter plot. You’ll see individual dots representing each city. It’s a start, but it doesn’t tell us much beyond the locations.

2. Enhancing the Visualization with More Information

We have population data, so let’s use it to make our plot more informative! We can adjust the size and color of each point based on its city’s population. This is a powerful technique for adding an extra dimension of information to your maps.

  • s (size): We’ll make the points larger for cities with higher populations.
  • c (color): We’ll color the points based on population, using a color gradient where, for example, darker colors mean higher populations.
  • cmap (color map): This specifies the color scheme Matplotlib should use for the c argument. ‘viridis’ is a good default that works well for many types of data.
  • alpha (transparency): If you have many overlapping points, alpha (a value between 0 and 1) can make them transparent, allowing you to see density.

Let’s update our plotting code:

plt.figure(figsize=(12, 10))

plt.scatter(df['Longitude'], df['Latitude'],
            s=df['Population_Millions']*100, # Size points by population (adjust multiplier for desired visual size)
            c=df['Population_Millions'],    # Color points by population
            cmap='viridis',                 # Color map for the population values
            alpha=0.7,
            edgecolors='w',                 # White edges for better visibility
            linewidth=0.5)

plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('Major US Cities by Latitude, Longitude, and Population')
plt.grid(True) # Add a grid for better readability

plt.colorbar(label='Population (Millions)')

for i, row in df.iterrows():
    # plt.text() adds text at a specific coordinate
    # We add a small offset to Longitude and Latitude so the text doesn't overlap the point
    plt.text(row['Longitude'] + 0.5, row['Latitude'], row['City'], fontsize=9, ha='left')

plt.xlim(df['Longitude'].min() - 5, df['Longitude'].max() + 10) # Added some padding
plt.ylim(df['Latitude'].min() - 5, df['Latitude'].max() + 5)   # Added some padding


plt.show()

Now, when you run this code, you’ll see a much more informative map! Cities with larger populations will appear as bigger and often different-colored dots. The color bar on the side will help you understand what each color represents in terms of population.

Best Practices and Tips

To make your geographic visualizations even better:

  • Always Label Axes and Titles: This makes your plot understandable to anyone who sees it.
  • Choose Appropriate Scales: Sometimes, your data might be clustered in a small area, making other parts of the map look empty. You can zoom in using plt.xlim() and plt.ylim() to focus on specific regions.
  • Use Meaningful Colors: Select color schemes that make sense for your data. For example, a diverging color map (like ‘RdBu’) is good for data that goes above and below a central value (like temperature anomalies), while sequential color maps (like ‘viridis’ or ‘Blues’) are great for values that increase progressively (like population).
  • Save Your Plots: You can save your visualization as an image file (like PNG or JPG) using plt.savefig('my_geographic_map.png') before plt.show().

Next Steps

While Matplotlib and Pandas are great for basic geographic visualizations, the world of geospatial data is vast! Here are some advanced topics you might want to explore later:

  • Overlaying on Actual Maps: Libraries like Cartopy or Basemap (though Basemap is older and less maintained) allow you to plot your data on top of real map backgrounds with coastlines, borders, and oceans. GeoPandas extends Pandas to handle spatial data types and integrates well with plotting on maps.
  • Interactive Maps: Tools like Folium (for Leaflet maps) or Plotly can create interactive web maps where users can zoom, pan, and click on points to get more information.

Conclusion

You’ve learned how to harness the power of Pandas to manage your geographic data and Matplotlib to create insightful visualizations. Starting with a simple scatter plot and then enhancing it with features like size and color based on data values, you can turn raw latitude and longitude coordinates into meaningful stories.

Keep experimenting with different datasets and customization options. Visualizing geographic data is a powerful skill that can uncover patterns and trends hidden within your location-based information. Happy mapping!


Comments

Leave a Reply