Skip to content

Module 1: Python Basics

Learning Goals

  • Understand what Python is and where it's used
  • Work with variables and basic data types
  • Use lists and dictionaries effectively
  • Write and use for-loops
  • Create simple functions
  • Import and use libraries
  • Read data with pandas

What is Python?

Python is a high-level, interpreted programming language that's widely used for:

  • Data Science & Analytics - pandas, numpy, matplotlib
  • Web Development - Django, Flask
  • Automation & Scripting - System administration, data processing
  • Geospatial Analysis - GeoPandas, Rasterio, Shapely
  • Machine Learning - scikit-learn, TensorFlow
graph TD
    A[Python] --> B[Data Science]
    A --> C[Web Development]
    A --> D[Automation]
    A --> E[GIS & Mapping]
    B --> F[pandas, numpy]
    C --> G[Django, Flask]
    D --> H[Scripts, APIs]
    E --> I[GeoPandas, Folium]

1. Python as a Calculator

Let's start with basic arithmetic operations:

# Basic arithmetic
print(5 + 3)    # Addition: 8
print(10 - 4)   # Subtraction: 6
print(6 * 7)    # Multiplication: 42
print(15 / 3)   # Division: 5.0
print(2 ** 3)   # Exponentiation: 8
print(17 % 5)   # Modulo (remainder): 2

Python Calculator Tips

  • Use ** for exponentiation (not ^)
  • Division / always returns a float
  • Use // for integer division
  • Use % to get the remainder

2. Variables and Data Types

Variables store data that can be used later:

# Numbers
age = 25
temperature = 23.5
population = 1_000_000  # Underscores for readability

# Strings
city_name = "San Francisco"
country = 'United States'
description = """This is a 
multi-line string"""

# Boolean
is_capital = True
has_data = False

# Check data types
print(type(age))          # <class 'int'>
print(type(temperature))  # <class 'float'>
print(type(city_name))    # <class 'str'>
print(type(is_capital))   # <class 'bool'>

String Operations

# String concatenation and formatting
first_name = "John"
last_name = "Doe"

# Method 1: Concatenation
full_name = first_name + " " + last_name
print(full_name)  # John Doe

# Method 2: f-strings (recommended)
greeting = f"Hello, {first_name}! You are {age} years old."
print(greeting)

# String methods
city = "san francisco"
print(city.title())      # San Francisco
print(city.upper())      # SAN FRANCISCO
print(city.replace(" ", "_"))  # san_francisco

3. Lists - Ordered Collections

Lists store multiple items in order:

# Creating lists
cities = ["New York", "London", "Tokyo", "Sydney"]
populations = [8_400_000, 9_000_000, 13_960_000, 5_300_000]
mixed_data = ["Paris", 2_161_000, True, 48.8566]

# Accessing elements (0-indexed)
print(cities[0])    # New York (first item)
print(cities[-1])   # Sydney (last item)
print(cities[1:3])  # ['London', 'Tokyo'] (slice)

# Modifying lists
cities.append("Berlin")           # Add to end
cities.insert(1, "Los Angeles")  # Insert at position
cities.remove("Tokyo")           # Remove by value
last_city = cities.pop()         # Remove and return last

# List operations
print(len(cities))              # Number of items
print("London" in cities)       # Check if item exists
print(max(populations))         # Maximum value
print(sum(populations))         # Sum of all values

List Comprehensions (Bonus)

# Create new lists based on existing ones
numbers = [1, 2, 3, 4, 5]
squares = [x**2 for x in numbers]
print(squares)  # [1, 4, 9, 16, 25]

# Filter and transform
large_cities = [city for city in cities if len(city) > 6]
print(large_cities)

4. Dictionaries - Key-Value Pairs

Dictionaries store data as key-value pairs:

# Creating dictionaries
city_info = {
    "name": "San Francisco",
    "country": "USA",
    "population": 884_000,
    "coordinates": [37.7749, -122.4194],
    "is_capital": False
}

# Accessing values
print(city_info["name"])        # San Francisco
print(city_info.get("population"))  # 884000
print(city_info.get("area", "Unknown"))  # Unknown (default value)

# Modifying dictionaries
city_info["area"] = 121.4  # Add new key-value pair
city_info["population"] = 900_000  # Update existing value

# Dictionary methods
print(city_info.keys())    # All keys
print(city_info.values())  # All values
print(city_info.items())   # Key-value pairs

# Multiple cities
cities_data = {
    "San Francisco": {"population": 884_000, "country": "USA"},
    "London": {"population": 9_000_000, "country": "UK"},
    "Tokyo": {"population": 13_960_000, "country": "Japan"}
}

print(cities_data["London"]["population"])  # 9000000

5. For Loops - Iteration

For loops let you repeat code for each item in a collection:

# Loop through lists
cities = ["New York", "London", "Tokyo", "Sydney"]

for city in cities:
    print(f"I want to visit {city}")

# Loop with index
for i, city in enumerate(cities):
    print(f"{i+1}. {city}")

# Loop through dictionaries
city_populations = {
    "New York": 8_400_000,
    "London": 9_000_000,
    "Tokyo": 13_960_000
}

# Loop through keys
for city in city_populations:
    print(city)

# Loop through key-value pairs
for city, population in city_populations.items():
    print(f"{city}: {population:,} people")

# Loop through values
for population in city_populations.values():
    print(f"Population: {population:,}")

Range Function

# Generate sequences of numbers
for i in range(5):          # 0, 1, 2, 3, 4
    print(i)

for i in range(1, 6):       # 1, 2, 3, 4, 5
    print(i)

for i in range(0, 10, 2):   # 0, 2, 4, 6, 8
    print(i)

# Practical example: process multiple files
file_numbers = range(1, 6)
for num in file_numbers:
    filename = f"data_{num}.csv"
    print(f"Processing {filename}")

6. Functions - Reusable Code

Functions help organize and reuse code:

# Simple function
def greet(name):
    """Greet a person by name"""
    return f"Hello, {name}!"

# Call the function
message = greet("Alice")
print(message)  # Hello, Alice!

# Function with multiple parameters
def calculate_density(population, area):
    """Calculate population density"""
    if area == 0:
        return 0
    return population / area

# Function with default parameters
def describe_city(name, population, country="Unknown"):
    """Describe a city with its basic information"""
    density = calculate_density(population, 100)  # Assume 100 km²
    return f"{name} ({country}): {population:,} people, density: {density:.1f}/km²"

# Using functions
sf_info = describe_city("San Francisco", 884_000, "USA")
print(sf_info)

mystery_city = describe_city("Mystery City", 500_000)
print(mystery_city)

Functions with Lists and Dictionaries

def analyze_cities(cities_dict):
    """Analyze a dictionary of cities and their populations"""
    total_population = sum(cities_dict.values())
    largest_city = max(cities_dict, key=cities_dict.get)
    smallest_city = min(cities_dict, key=cities_dict.get)

    return {
        "total_population": total_population,
        "largest_city": largest_city,
        "smallest_city": smallest_city,
        "average_population": total_population / len(cities_dict)
    }

# Example usage
cities = {
    "New York": 8_400_000,
    "Los Angeles": 3_900_000,
    "Chicago": 2_700_000,
    "Houston": 2_300_000
}

analysis = analyze_cities(cities)
print(f"Total population: {analysis['total_population']:,}")
print(f"Largest city: {analysis['largest_city']}")
print(f"Average population: {analysis['average_population']:,.0f}")

7. Importing Libraries

Libraries extend Python's capabilities:

# Import entire modules
import math
import random
from datetime import datetime

# Using imported functions
print(math.sqrt(16))        # 4.0
print(math.pi)              # 3.141592653589793
print(random.randint(1, 10))  # Random number between 1-10
print(datetime.now())       # Current date and time

# Import specific functions
from math import sqrt, pi, sin
print(sqrt(25))  # 5.0

# Import with alias
import pandas as pd
import numpy as np

# These are common conventions in data science

8. Working with CSV Data using Pandas

Pandas is the most popular library for data analysis:

import pandas as pd

# Create sample data
data = {
    'city': ['New York', 'London', 'Tokyo', 'Sydney', 'Paris'],
    'country': ['USA', 'UK', 'Japan', 'Australia', 'France'],
    'population': [8_400_000, 9_000_000, 13_960_000, 5_300_000, 2_161_000],
    'area_km2': [783, 1572, 2194, 12368, 105]
}

# Create DataFrame
df = pd.DataFrame(data)

# Display data
print("First 3 rows:")
print(df.head(3))

print("\nDataFrame info:")
print(df.info())

print("\nBasic statistics:")
print(df.describe())

# Calculate population density
df['density'] = df['population'] / df['area_km2']

# Filter data
large_cities = df[df['population'] > 5_000_000]
print("\nCities with population > 5 million:")
print(large_cities[['city', 'population']])

# Sort data
df_sorted = df.sort_values('density', ascending=False)
print("\nCities by density (highest first):")
print(df_sorted[['city', 'density']].round(1))

# Group by country (if we had more data)
print(f"\nAverage population: {df['population'].mean():,.0f}")
print(f"Total population: {df['population'].sum():,}")

Pandas Tips

  • Use df.head() to see first few rows
  • Use df.info() to see data types and missing values
  • Use df.describe() for statistical summary
  • Column names with spaces need brackets: df['column name']

Practice Problems

Problem 1: City Analysis

Create a program that analyzes city data:

# Your task: Complete this code
cities_data = [
    {"name": "Mumbai", "population": 20_400_000, "area": 603},
    {"name": "Delhi", "population": 16_800_000, "area": 1484},
    {"name": "Bangalore", "population": 8_400_000, "area": 709},
    {"name": "Chennai", "population": 7_100_000, "area": 426}
]

# TODO: 
# 1. Calculate population density for each city
# 2. Find the city with highest density
# 3. Calculate average population
# 4. Create a function to format city information nicely

def calculate_city_stats(cities):
    # Your code here
    pass

# Test your function
result = calculate_city_stats(cities_data)
print(result)
Solution
def calculate_city_stats(cities):
    # Add density to each city
    for city in cities:
        city['density'] = city['population'] / city['area']

    # Find highest density city
    highest_density_city = max(cities, key=lambda x: x['density'])

    # Calculate average population
    total_pop = sum(city['population'] for city in cities)
    avg_pop = total_pop / len(cities)

    return {
        'highest_density_city': highest_density_city['name'],
        'highest_density': highest_density_city['density'],
        'average_population': avg_pop,
        'total_cities': len(cities)
    }

result = calculate_city_stats(cities_data)
print(f"Highest density: {result['highest_density_city']} ({result['highest_density']:.1f} people/km²)")
print(f"Average population: {result['average_population']:,.0f}")

Problem 2: Data Processing

Work with a list of temperatures:

# Temperature data for a week (in Celsius)
temperatures = [22, 25, 28, 24, 26, 30, 27]
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']

# TODO:
# 1. Convert all temperatures to Fahrenheit
# 2. Find the hottest and coldest days
# 3. Calculate average temperature
# 4. Count how many days were above 25°C

def analyze_weather(temps, day_names):
    # Your code here
    pass
Solution
def analyze_weather(temps, day_names):
    # Convert to Fahrenheit
    temps_f = [(temp * 9/5) + 32 for temp in temps]

    # Find hottest and coldest days
    hottest_idx = temps.index(max(temps))
    coldest_idx = temps.index(min(temps))

    # Calculate average
    avg_temp = sum(temps) / len(temps)

    # Count days above 25°C
    hot_days = sum(1 for temp in temps if temp > 25)

    return {
        'temperatures_f': temps_f,
        'hottest_day': day_names[hottest_idx],
        'coldest_day': day_names[coldest_idx],
        'average_temp': avg_temp,
        'days_above_25': hot_days
    }

result = analyze_weather(temperatures, days)
print(f"Hottest day: {result['hottest_day']}")
print(f"Average temperature: {result['average_temp']:.1f}°C")
print(f"Days above 25°C: {result['days_above_25']}")

Problem 3: Working with Real Data

Create a simple data analysis:

import pandas as pd

# Sample country data
country_data = {
    'country': ['China', 'India', 'USA', 'Indonesia', 'Pakistan', 'Brazil'],
    'population': [1439, 1380, 331, 273, 220, 212],  # millions
    'area': [9596, 3287, 9834, 1905, 881, 8515],     # thousand km²
    'gdp': [14.34, 2.87, 21.43, 1.29, 0.35, 1.61]   # trillion USD
}

# TODO:
# 1. Create a DataFrame
# 2. Calculate population density
# 3. Calculate GDP per capita
# 4. Find the top 3 countries by GDP per capita
# 5. Save results to a new CSV file

# Your code here
Solution
import pandas as pd

# Create DataFrame
df = pd.DataFrame(country_data)

# Calculate new columns
df['pop_density'] = df['population'] / df['area']  # people per km²
df['gdp_per_capita'] = (df['gdp'] * 1000) / df['population']  # thousands USD

# Find top 3 by GDP per capita
top_gdp = df.nlargest(3, 'gdp_per_capita')

print("Top 3 countries by GDP per capita:")
print(top_gdp[['country', 'gdp_per_capita']].round(1))

# Save to CSV
df.to_csv('country_analysis.csv', index=False)
print("\nData saved to country_analysis.csv")

Key Takeaways

What You've Learned

  • Variables: Store and manipulate different types of data
  • Lists: Work with ordered collections of items
  • Dictionaries: Store key-value pairs for structured data
  • Loops: Iterate through data efficiently
  • Functions: Create reusable code blocks
  • Libraries: Extend Python's capabilities with pandas
  • Data Analysis: Basic operations on real-world datasets

Best Practices

  • Use descriptive variable names: population not p
  • Comment your code to explain complex logic
  • Use f-strings for string formatting
  • Handle edge cases (like division by zero)
  • Import only what you need from libraries

Next Steps

In the next module, we'll apply these Python skills to geospatial data, learning about: - Vector vs Raster data - Coordinate Reference Systems - Loading and visualizing geographic data - Basic GIS concepts through Python

graph LR
    A[Python Basics] --> B[GIS Fundamentals]
    B --> C[Vector Analysis]
    C --> D[Raster Analysis]
    D --> E[Visualization]
    E --> F[Web Apps]