Module 1: Python Basics¶
Learning Goals¶
- Understand what Python is and where it's used
- Work with variables and basic data types
- Use lists and dictionaries effectively
- Write and use for-loops
- Create simple functions
- Import and use libraries
- Read data with pandas
What is Python?¶
Python is a high-level, interpreted programming language that's widely used for:
- Data Science & Analytics - pandas, numpy, matplotlib
- Web Development - Django, Flask
- Automation & Scripting - System administration, data processing
- Geospatial Analysis - GeoPandas, Rasterio, Shapely
- Machine Learning - scikit-learn, TensorFlow
graph TD
A[Python] --> B[Data Science]
A --> C[Web Development]
A --> D[Automation]
A --> E[GIS & Mapping]
B --> F[pandas, numpy]
C --> G[Django, Flask]
D --> H[Scripts, APIs]
E --> I[GeoPandas, Folium]
1. Python as a Calculator¶
Let's start with basic arithmetic operations:
# Basic arithmetic
print(5 + 3) # Addition: 8
print(10 - 4) # Subtraction: 6
print(6 * 7) # Multiplication: 42
print(15 / 3) # Division: 5.0
print(2 ** 3) # Exponentiation: 8
print(17 % 5) # Modulo (remainder): 2
Python Calculator Tips
- Use
**for exponentiation (not^) - Division
/always returns a float - Use
//for integer division - Use
%to get the remainder
2. Variables and Data Types¶
Variables store data that can be used later:
# Numbers
age = 25
temperature = 23.5
population = 1_000_000 # Underscores for readability
# Strings
city_name = "San Francisco"
country = 'United States'
description = """This is a
multi-line string"""
# Boolean
is_capital = True
has_data = False
# Check data types
print(type(age)) # <class 'int'>
print(type(temperature)) # <class 'float'>
print(type(city_name)) # <class 'str'>
print(type(is_capital)) # <class 'bool'>
String Operations¶
# String concatenation and formatting
first_name = "John"
last_name = "Doe"
# Method 1: Concatenation
full_name = first_name + " " + last_name
print(full_name) # John Doe
# Method 2: f-strings (recommended)
greeting = f"Hello, {first_name}! You are {age} years old."
print(greeting)
# String methods
city = "san francisco"
print(city.title()) # San Francisco
print(city.upper()) # SAN FRANCISCO
print(city.replace(" ", "_")) # san_francisco
3. Lists - Ordered Collections¶
Lists store multiple items in order:
# Creating lists
cities = ["New York", "London", "Tokyo", "Sydney"]
populations = [8_400_000, 9_000_000, 13_960_000, 5_300_000]
mixed_data = ["Paris", 2_161_000, True, 48.8566]
# Accessing elements (0-indexed)
print(cities[0]) # New York (first item)
print(cities[-1]) # Sydney (last item)
print(cities[1:3]) # ['London', 'Tokyo'] (slice)
# Modifying lists
cities.append("Berlin") # Add to end
cities.insert(1, "Los Angeles") # Insert at position
cities.remove("Tokyo") # Remove by value
last_city = cities.pop() # Remove and return last
# List operations
print(len(cities)) # Number of items
print("London" in cities) # Check if item exists
print(max(populations)) # Maximum value
print(sum(populations)) # Sum of all values
List Comprehensions (Bonus)¶
# Create new lists based on existing ones
numbers = [1, 2, 3, 4, 5]
squares = [x**2 for x in numbers]
print(squares) # [1, 4, 9, 16, 25]
# Filter and transform
large_cities = [city for city in cities if len(city) > 6]
print(large_cities)
4. Dictionaries - Key-Value Pairs¶
Dictionaries store data as key-value pairs:
# Creating dictionaries
city_info = {
"name": "San Francisco",
"country": "USA",
"population": 884_000,
"coordinates": [37.7749, -122.4194],
"is_capital": False
}
# Accessing values
print(city_info["name"]) # San Francisco
print(city_info.get("population")) # 884000
print(city_info.get("area", "Unknown")) # Unknown (default value)
# Modifying dictionaries
city_info["area"] = 121.4 # Add new key-value pair
city_info["population"] = 900_000 # Update existing value
# Dictionary methods
print(city_info.keys()) # All keys
print(city_info.values()) # All values
print(city_info.items()) # Key-value pairs
# Multiple cities
cities_data = {
"San Francisco": {"population": 884_000, "country": "USA"},
"London": {"population": 9_000_000, "country": "UK"},
"Tokyo": {"population": 13_960_000, "country": "Japan"}
}
print(cities_data["London"]["population"]) # 9000000
5. For Loops - Iteration¶
For loops let you repeat code for each item in a collection:
# Loop through lists
cities = ["New York", "London", "Tokyo", "Sydney"]
for city in cities:
print(f"I want to visit {city}")
# Loop with index
for i, city in enumerate(cities):
print(f"{i+1}. {city}")
# Loop through dictionaries
city_populations = {
"New York": 8_400_000,
"London": 9_000_000,
"Tokyo": 13_960_000
}
# Loop through keys
for city in city_populations:
print(city)
# Loop through key-value pairs
for city, population in city_populations.items():
print(f"{city}: {population:,} people")
# Loop through values
for population in city_populations.values():
print(f"Population: {population:,}")
Range Function¶
# Generate sequences of numbers
for i in range(5): # 0, 1, 2, 3, 4
print(i)
for i in range(1, 6): # 1, 2, 3, 4, 5
print(i)
for i in range(0, 10, 2): # 0, 2, 4, 6, 8
print(i)
# Practical example: process multiple files
file_numbers = range(1, 6)
for num in file_numbers:
filename = f"data_{num}.csv"
print(f"Processing {filename}")
6. Functions - Reusable Code¶
Functions help organize and reuse code:
# Simple function
def greet(name):
"""Greet a person by name"""
return f"Hello, {name}!"
# Call the function
message = greet("Alice")
print(message) # Hello, Alice!
# Function with multiple parameters
def calculate_density(population, area):
"""Calculate population density"""
if area == 0:
return 0
return population / area
# Function with default parameters
def describe_city(name, population, country="Unknown"):
"""Describe a city with its basic information"""
density = calculate_density(population, 100) # Assume 100 km²
return f"{name} ({country}): {population:,} people, density: {density:.1f}/km²"
# Using functions
sf_info = describe_city("San Francisco", 884_000, "USA")
print(sf_info)
mystery_city = describe_city("Mystery City", 500_000)
print(mystery_city)
Functions with Lists and Dictionaries¶
def analyze_cities(cities_dict):
"""Analyze a dictionary of cities and their populations"""
total_population = sum(cities_dict.values())
largest_city = max(cities_dict, key=cities_dict.get)
smallest_city = min(cities_dict, key=cities_dict.get)
return {
"total_population": total_population,
"largest_city": largest_city,
"smallest_city": smallest_city,
"average_population": total_population / len(cities_dict)
}
# Example usage
cities = {
"New York": 8_400_000,
"Los Angeles": 3_900_000,
"Chicago": 2_700_000,
"Houston": 2_300_000
}
analysis = analyze_cities(cities)
print(f"Total population: {analysis['total_population']:,}")
print(f"Largest city: {analysis['largest_city']}")
print(f"Average population: {analysis['average_population']:,.0f}")
7. Importing Libraries¶
Libraries extend Python's capabilities:
# Import entire modules
import math
import random
from datetime import datetime
# Using imported functions
print(math.sqrt(16)) # 4.0
print(math.pi) # 3.141592653589793
print(random.randint(1, 10)) # Random number between 1-10
print(datetime.now()) # Current date and time
# Import specific functions
from math import sqrt, pi, sin
print(sqrt(25)) # 5.0
# Import with alias
import pandas as pd
import numpy as np
# These are common conventions in data science
8. Working with CSV Data using Pandas¶
Pandas is the most popular library for data analysis:
import pandas as pd
# Create sample data
data = {
'city': ['New York', 'London', 'Tokyo', 'Sydney', 'Paris'],
'country': ['USA', 'UK', 'Japan', 'Australia', 'France'],
'population': [8_400_000, 9_000_000, 13_960_000, 5_300_000, 2_161_000],
'area_km2': [783, 1572, 2194, 12368, 105]
}
# Create DataFrame
df = pd.DataFrame(data)
# Display data
print("First 3 rows:")
print(df.head(3))
print("\nDataFrame info:")
print(df.info())
print("\nBasic statistics:")
print(df.describe())
# Calculate population density
df['density'] = df['population'] / df['area_km2']
# Filter data
large_cities = df[df['population'] > 5_000_000]
print("\nCities with population > 5 million:")
print(large_cities[['city', 'population']])
# Sort data
df_sorted = df.sort_values('density', ascending=False)
print("\nCities by density (highest first):")
print(df_sorted[['city', 'density']].round(1))
# Group by country (if we had more data)
print(f"\nAverage population: {df['population'].mean():,.0f}")
print(f"Total population: {df['population'].sum():,}")
Pandas Tips
- Use
df.head()to see first few rows - Use
df.info()to see data types and missing values - Use
df.describe()for statistical summary - Column names with spaces need brackets:
df['column name']
Practice Problems¶
Problem 1: City Analysis¶
Create a program that analyzes city data:
# Your task: Complete this code
cities_data = [
{"name": "Mumbai", "population": 20_400_000, "area": 603},
{"name": "Delhi", "population": 16_800_000, "area": 1484},
{"name": "Bangalore", "population": 8_400_000, "area": 709},
{"name": "Chennai", "population": 7_100_000, "area": 426}
]
# TODO:
# 1. Calculate population density for each city
# 2. Find the city with highest density
# 3. Calculate average population
# 4. Create a function to format city information nicely
def calculate_city_stats(cities):
# Your code here
pass
# Test your function
result = calculate_city_stats(cities_data)
print(result)
Solution
def calculate_city_stats(cities):
# Add density to each city
for city in cities:
city['density'] = city['population'] / city['area']
# Find highest density city
highest_density_city = max(cities, key=lambda x: x['density'])
# Calculate average population
total_pop = sum(city['population'] for city in cities)
avg_pop = total_pop / len(cities)
return {
'highest_density_city': highest_density_city['name'],
'highest_density': highest_density_city['density'],
'average_population': avg_pop,
'total_cities': len(cities)
}
result = calculate_city_stats(cities_data)
print(f"Highest density: {result['highest_density_city']} ({result['highest_density']:.1f} people/km²)")
print(f"Average population: {result['average_population']:,.0f}")
Problem 2: Data Processing¶
Work with a list of temperatures:
# Temperature data for a week (in Celsius)
temperatures = [22, 25, 28, 24, 26, 30, 27]
days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
# TODO:
# 1. Convert all temperatures to Fahrenheit
# 2. Find the hottest and coldest days
# 3. Calculate average temperature
# 4. Count how many days were above 25°C
def analyze_weather(temps, day_names):
# Your code here
pass
Solution
def analyze_weather(temps, day_names):
# Convert to Fahrenheit
temps_f = [(temp * 9/5) + 32 for temp in temps]
# Find hottest and coldest days
hottest_idx = temps.index(max(temps))
coldest_idx = temps.index(min(temps))
# Calculate average
avg_temp = sum(temps) / len(temps)
# Count days above 25°C
hot_days = sum(1 for temp in temps if temp > 25)
return {
'temperatures_f': temps_f,
'hottest_day': day_names[hottest_idx],
'coldest_day': day_names[coldest_idx],
'average_temp': avg_temp,
'days_above_25': hot_days
}
result = analyze_weather(temperatures, days)
print(f"Hottest day: {result['hottest_day']}")
print(f"Average temperature: {result['average_temp']:.1f}°C")
print(f"Days above 25°C: {result['days_above_25']}")
Problem 3: Working with Real Data¶
Create a simple data analysis:
import pandas as pd
# Sample country data
country_data = {
'country': ['China', 'India', 'USA', 'Indonesia', 'Pakistan', 'Brazil'],
'population': [1439, 1380, 331, 273, 220, 212], # millions
'area': [9596, 3287, 9834, 1905, 881, 8515], # thousand km²
'gdp': [14.34, 2.87, 21.43, 1.29, 0.35, 1.61] # trillion USD
}
# TODO:
# 1. Create a DataFrame
# 2. Calculate population density
# 3. Calculate GDP per capita
# 4. Find the top 3 countries by GDP per capita
# 5. Save results to a new CSV file
# Your code here
Solution
import pandas as pd
# Create DataFrame
df = pd.DataFrame(country_data)
# Calculate new columns
df['pop_density'] = df['population'] / df['area'] # people per km²
df['gdp_per_capita'] = (df['gdp'] * 1000) / df['population'] # thousands USD
# Find top 3 by GDP per capita
top_gdp = df.nlargest(3, 'gdp_per_capita')
print("Top 3 countries by GDP per capita:")
print(top_gdp[['country', 'gdp_per_capita']].round(1))
# Save to CSV
df.to_csv('country_analysis.csv', index=False)
print("\nData saved to country_analysis.csv")
Key Takeaways¶
What You've Learned
- Variables: Store and manipulate different types of data
- Lists: Work with ordered collections of items
- Dictionaries: Store key-value pairs for structured data
- Loops: Iterate through data efficiently
- Functions: Create reusable code blocks
- Libraries: Extend Python's capabilities with pandas
- Data Analysis: Basic operations on real-world datasets
Best Practices
- Use descriptive variable names:
populationnotp - Comment your code to explain complex logic
- Use f-strings for string formatting
- Handle edge cases (like division by zero)
- Import only what you need from libraries
Next Steps¶
In the next module, we'll apply these Python skills to geospatial data, learning about: - Vector vs Raster data - Coordinate Reference Systems - Loading and visualizing geographic data - Basic GIS concepts through Python
graph LR
A[Python Basics] --> B[GIS Fundamentals]
B --> C[Vector Analysis]
C --> D[Raster Analysis]
D --> E[Visualization]
E --> F[Web Apps]