Skip to content

Module 4: Raster Data & Analysis

Learning Goals

  • Explain raster data in simple terms (grid of pixels, cell values, resolution, extent, CRS, NoData)
  • Load and inspect raster files with rasterio
  • Perform raster calculations and statistics
  • Clip rasters using vector boundaries
  • Combine raster and vector data analysis
  • Handle NoData values and data types
  • Complete the basics assignment (bundled Tiff_1.tif / Tiff_2.tif, small rasterio scripts) before the Advance Analytics walkthrough

Introduction to Raster Data

In the simplest terms, a raster is a checkerboard of numbers on a map. Imagine graph paper laid over an area: every square (cell) holds one value—elevation, temperature, a land-cover code, or how “bright” the ground is in a satellite band. The squares are usually the same size on the ground (for example 30 m × 30 m), and together they cover a rectangle of territory. There is no outline of a road or a lake stored as a line or polygon; instead, each square stores whatever is true in that patch of ground (often an average or sample for the whole square).

It helps to think of a raster as a 2D table (rows × columns) where each cell has (row, column) → value, plus metadata that says where that table sits on Earth (origin, cell size, CRS). A photo is also a raster: each pixel has red, green, blue values. GIS rasters use the same grid idea, but the “colour” might be height, rainfall, or forest = 5, water = 1 instead of RGB.

Key ideas (plain language)

  • Grid — Fixed rows and columns of cells (often called pixels). Each cell is one ground patch.
  • Cell value — Usually one number per cell; satellite scenes can have several numbers per cell (bands), like extra columns glued to the same square.
  • ResolutionHow wide one cell is on the ground (e.g. 10 m). Smaller cells = more detail = more rows and columns and bigger files.
  • Extent — The outer box of the raster: smallest and largest x / y (or lon / lat) the grid covers.
  • CRS — The rule that turns column/row positions into real places on Earth (see Module 2 if CRS is new to you).
  • NoData — A sentinel value meaning “no measurement here” (cloud, sea mask, edge of study area). It is not the same as zero.
graph TD
    A[Raster Data] --> B[Grid Structure]
    A --> C[Cell Values]
    A --> D[Spatial Reference]
    B --> E[Rows & Columns]
    B --> F[Resolution]
    C --> G[Continuous Values]
    C --> H[Categorical Values]
    D --> I[Coordinate System]
    D --> J[Extent/Bounds]

Common Raster Data Types

Digital Elevation Models (DEMs): Store terrain height values, essential for topographic analysis, watershed modeling, and visibility studies.

Satellite Imagery: Multispectral data capturing different wavelengths of light. Each band represents different portions of the electromagnetic spectrum:

  • Visible bands (Red, Green, Blue): What human eyes can see
  • Near-Infrared (NIR): Useful for vegetation analysis
  • Thermal bands: Measure surface temperature
  • Radar bands: Can penetrate clouds and work day/night

Climate Data: Temperature, precipitation, humidity measurements over time and space.

Population Density: Number of people per unit area, often derived from census data.

Land Cover/Land Use: Classification of Earth's surface (forest, urban, water, agriculture).

Raster vs Vector Data

Aspect Raster Vector
Structure Grid of cells Points, lines, polygons
Best for Continuous phenomena Discrete objects
Storage Fixed grid size Variable based on complexity
Analysis Mathematical operations Geometric operations
Examples Temperature, elevation Roads, boundaries, buildings

Understanding Raster Resolution

Spatial Resolution: The ground distance represented by each pixel

  • High resolution (1-5m): Detailed analysis, small areas
  • Medium resolution (10-30m): Regional studies, land cover mapping
  • Low resolution (100m-1km): Global studies, climate modeling

Same scene in GeoTIFF: 10 m vs 250 m spatial resolution

Spatial resolution is the ground width of one pixel. A 10 m GeoTIFF divides the landscape into small squares; a 250 m GeoTIFF uses much larger squares to cover the same hills, fields, and settlements. Both can be valid products—they answer different questions.

10 m pixels 250 m pixels
Detail Sharper: narrow roads, field edges, and small patches stay visible at map scale Smoother / blockier: one value averages a big patch of ground, so fine features blend together
Rows & columns Many cells for the same map extent → larger file, slower to process Few cells → smaller file, faster summaries over big regions
Typical use Site-scale work, building footprints, trail mapping Regional or global overview, climate grids, coarse land cover

Important: lower resolution (bigger metres per pixel) does not mean a “clearer” picture in the sense of sharp edges—it usually means less spatial detail, but the map can look less noisy when you are zoomed out, because each pixel hides small variations inside its big footprint. Higher resolution (smaller metres per pixel) is what makes small features easier to see when you zoom in.

Below, the same kind of GeoTIFF is shown at ~10 m and ~250 m cell size (illustrative exports). Compare how edges and texture change.

GeoTIFF visualized at about 10 m ground resolution—finer grid, more detail

GeoTIFF visualized at about 250 m ground resolution—coarser grid, broader averaging per pixel

Temporal Resolution: How often data is collected

  • Daily: Weather data, some satellites
  • Weekly/Monthly: Vegetation monitoring
  • Annual: Land cover change detection

Spectral Resolution: Number and width of spectral bands

  • Panchromatic: Single band (black and white)
  • Multispectral: 3-10 bands (RGB + infrared)
  • Hyperspectral: 100+ narrow bands

Raster File Formats

GeoTIFF (.tif/.tiff): Most common format, supports georeferencing and multiple bands

Sample GeoTIFFs for practice (in assets/tiff/ in this repo):

Click to download Tiff_1.tif Click to download Tiff_2.tif

Preview a GeoTIFF in the browser

If you want a quick visual check that a file opens on a map with plausible alignment—without QGIS or Python yet—try the free Pozyx Online GeoTIFF Viewer (upload your .tif in the browser). It is aimed at RGB-style rasters, common CRS tags, and fast validation; for many-band scientific rasters, COG streaming, or heavy analysis, use QGIS / rasterio as described later in this module. See Pozyx’s page for current limits and what the tool does not support.

NetCDF (.nc): Excellent for scientific data with multiple dimensions (time, depth)

HDF (.hdf): Hierarchical format for complex scientific datasets

JPEG2000 (.jp2): Compressed format with good quality retention

ESRI Grid: Proprietary format used by ArcGIS

Raster Coordinate Systems

Geographic Coordinates: Latitude/longitude in degrees - Good for: Global datasets, web mapping - Units: Decimal degrees - Example: WGS84 (EPSG:4326)

Projected Coordinates: Flat map projections in linear units - Good for: Area calculations, distance measurements - Units: Meters, feet - Example: UTM zones, State Plane

Raster Data Quality Considerations

Accuracy: How close values are to true measurements

Precision: Consistency of repeated measurements

Completeness: Percentage of area covered (vs NoData)

Currency: How recent the data is

Lineage: Documentation of data sources and processing steps

Raster bands and spectral indices

What are raster bands?

A raster band is one layer of values inside a raster dataset.

A raster image may contain one band or multiple bands.

Each band stores a specific measurement for every pixel.

Examples:

  • visible light
  • near infrared
  • shortwave infrared
  • elevation
  • temperature

You can think of bands like stacked layers.

Each layer has:

  • same width
  • same height
  • different values

Example:

A satellite image may store:

  • Band 1 → Blue
  • Band 2 → Green
  • Band 3 → Red
  • Band 4 → Near Infrared

The same pixel location will have one value in each band.

Example:

Pixel location Blue Green Red NIR
Row 1, Col 1 34 45 55 120
Row 1, Col 2 30 40 50 115
Row 2, Col 1 32 44 54 118

This allows comparison between wavelengths.


Common raster bands

Different satellites may use different band numbers.

Example:

Band Name Common use
1 Blue water, coastline
2 Green vegetation
3 Red plant health
4 Near Infrared (NIR) vegetation analysis
5 SWIR 1 moisture
6 SWIR 2 soil and geology
7 Thermal heat

Why bands matter

Different land surfaces reflect light differently.

Example:

Surface Blue Red NIR
Water low low very low
Soil medium medium medium
Healthy vegetation medium low high
Built-up medium medium medium-high

Because of this, we can combine bands mathematically and create spectral indices.


What are spectral indices?

A spectral index is a formula created using raster bands.

It highlights a specific land condition such as:

  • vegetation
  • water
  • soil
  • burned areas
  • urban areas

Indices are calculated pixel by pixel.

Result:

  • one new raster
  • each pixel stores calculated value

NDVI — Normalized Difference Vegetation Index

NDVI measures vegetation health.

Healthy vegetation reflects:

  • high NIR
  • low Red

Formula:

NDVI = (NIR - Red) / (NIR + Red)
````

Example:

| Red | NIR |  NDVI |
| --: | --: | ----: |
|  40 | 140 |  0.56 |
|  70 |  75 |  0.03 |
|  80 |  50 | -0.23 |

Typical NDVI values:

|      NDVI | Meaning                  |
| --------: | ------------------------ |
|       < 0 | water / clouds           |
|   0 – 0.2 | bare soil                |
| 0.2 – 0.5 | vegetation               |
|     > 0.5 | dense healthy vegetation |

Uses:

* crop monitoring
* forest analysis
* drought detection

---

## EVI — Enhanced Vegetation Index

EVI improves vegetation detection where NDVI may saturate.

It uses:

* NIR
* Red
* Blue

Formula:

```text
EVI = 2.5 × (NIR - Red) / (NIR + 6×Red - 7.5×Blue + 1)

Example:

Blue Red NIR EVI
20 40 140 0.63
25 60 100 0.31

Uses:

  • dense vegetation
  • tropical forests
  • crop monitoring

NDWI — Normalized Difference Water Index

NDWI highlights water.

Uses:

  • Green
  • NIR

Formula:

NDWI = (Green - NIR) / (Green + NIR)

Example:

Green NIR NDWI
80 20 0.60
40 90 -0.38

Uses:

  • lakes
  • rivers
  • water bodies

NDBI — Normalized Difference Built-up Index

Detects urban and built-up areas.

Uses:

  • SWIR
  • NIR

Formula:

NDBI = (SWIR - NIR) / (SWIR + NIR)

Example:

SWIR NIR NDBI
130 80 0.24
70 120 -0.26

Uses:

  • urban growth
  • buildings
  • roads

SAVI — Soil Adjusted Vegetation Index

Improves vegetation detection where soil is visible.

Formula:

SAVI = ((NIR - Red) / (NIR + Red + L)) × (1 + L)

Usually:

L = 0.5

Uses:

  • agriculture
  • dry areas
  • sparse vegetation

BSI — Bare Soil Index

Highlights exposed soil.

Formula:

BSI = ((SWIR + Red) - (NIR + Blue)) / ((SWIR + Red) + (NIR + Blue))

Uses:

  • barren land
  • soil analysis

Summary

Raster bands store values in separate image layers.

By combining bands using formulas we can create indices.

Common indices:

Index Uses
NDVI vegetation
EVI dense vegetation
NDWI water
NDBI built-up
SAVI soil + vegetation
BSI bare soil

These indices are widely used in:

  • agriculture
  • remote sensing
  • environmental monitoring
  • land use mapping
  • GIS analysis

Download satellite data (Sentinel-2)

In this section we will learn how to download satellite imagery from Sentinel-2 using Python and cloud-based geospatial catalogs.

Sentinel-2 is a satellite mission from the European Space Agency (ESA).

It captures Earth observation imagery useful for:

  • vegetation monitoring
  • agriculture
  • land use mapping
  • flood analysis
  • forest monitoring
  • water resources
  • urban growth

Sentinel-2 imagery is freely available and commonly used in GIS and remote sensing.

It provides:

  • 10 m resolution → Blue, Green, Red, NIR
  • 20 m resolution → Red Edge, SWIR
  • 60 m resolution → atmospheric bands

Common useful bands:

Band Name Resolution
B2 Blue 10 m
B3 Green 10 m
B4 Red 10 m
B8 Near Infrared 10 m
B11 SWIR 20 m
B12 SWIR 20 m

Sentinel-2 data can be downloaded from several cloud catalogs.

A common workflow:

  • search imagery by location
  • filter by date
  • filter by cloud cover
  • select required bands
  • load imagery into Python
  • process raster

1. Microsoft Planetary Computer

Microsoft Planetary Computer is a cloud platform that provides access to large public geospatial datasets.

It includes:

  • Sentinel-2
  • Landsat
  • DEM datasets
  • climate data
  • land cover

Website:

https://planetarycomputer.microsoft.com/

It uses the STAC (SpatioTemporal Asset Catalog) standard.

This means we can search data using:

  • bounding box
  • geometry
  • date range
  • cloud cover
  • collection name

Example collections:

  • sentinel-2-l2a
  • landsat-c2-l2

Benefits:

  • free access
  • cloud hosted
  • fast search
  • no manual downloading needed
  • ready for Python workflows

Typical workflow:

  1. open catalog
  2. search Sentinel-2
  3. choose date + area
  4. filter by cloud cover
  5. access raster assets

Useful for:

  • satellite analysis
  • time series
  • NDVI
  • agriculture monitoring

2. PySTAC

PySTAC is a Python library for working with STAC catalogs.

STAC is a standard format used to describe geospatial datasets.

PySTAC helps read:

  • catalogs
  • collections
  • items
  • raster assets

It lets us:

  • open STAC catalog
  • search datasets
  • inspect metadata
  • list available bands
  • access image links

Example metadata available:

  • acquisition date
  • cloud cover
  • satellite name
  • band URLs
  • projection

Useful for:

  • discovering satellite scenes
  • reading metadata
  • selecting imagery

Simple idea:

  • Planetary Computer stores the data
  • PySTAC reads the catalog and metadata

Typical workflow:

  1. connect to catalog
  2. search Sentinel-2
  3. read returned items
  4. inspect bands
  5. select image

Useful when:

  • browsing scenes
  • checking metadata
  • building automated search

3. ODC-STAC

ODC-STAC stands for:

Open Data Cube + STAC

It is a Python library used to load STAC items directly into analysis-ready arrays.

While PySTAC helps discover data, ODC-STAC helps load it for analysis.

It works well with:

  • xarray
  • NumPy
  • Rasterio

It helps us:

  • load selected bands
  • clip to area
  • combine scenes
  • stack rasters
  • prepare analysis-ready data

Example uses:

  • NDVI calculation
  • cloud filtering
  • time-series analysis
  • mosaics

Typical workflow:

  1. search Sentinel-2 using STAC
  2. select items
  3. load bands with ODC-STAC
  4. calculate indices
  5. export results

Benefits:

  • fast loading
  • multiple scenes
  • direct analysis
  • works well with raster workflows

Quick comparison

Tool Main use
Microsoft Planetary Computer cloud catalog + satellite storage
PySTAC search and read STAC metadata
ODC-STAC load imagery for analysis

Simple workflow:

Microsoft Planetary Computer
Search Sentinel-2 scenes
PySTAC reads catalog + metadata
ODC-STAC loads imagery
Raster analysis with Python

Search available Sentinel-2 data

Before downloading satellite imagery, we usually search available scenes for our study area.

This helps us:

  • check whether imagery exists
  • filter by date
  • filter by cloud cover
  • inspect available scenes
  • choose the best image before loading bands

For Sentinel-2, a common workflow is:

  • define study area (GeoJSON)
  • choose start and end date
  • choose maximum cloud cover
  • search catalog
  • inspect results
  • select image

We can search data directly from Microsoft Planetary Computer using PySTAC Client.

Install required libraries

!pip install pystac-client planetary-computer odc-stac geopandas pandas

Libraries used:

  • pystac-client → search STAC catalog
  • planetary-computer → Microsoft Planetary Computer access
  • odc-stac → load imagery
  • geopandas → vector boundaries
  • pandas → tabular results

Search available Sentinel-2 imagery

This program:

  • connects to Microsoft Planetary Computer
  • searches Sentinel-2 Level-2A
  • filters by:
  • polygon boundary
  • date range
  • cloud cover
  • returns results as JSON
"""
Search Sentinel-2 imagery from Planetary Computer
and return JSON.
"""

import json
from pystac_client import Client

STAC_URL = "https://planetarycomputer.microsoft.com/api/stac/v1"


def search_sentinel2(
    geojson_fc: dict,
    start_date: str,
    end_date: str,
    max_cloud_cover: int = 20,
):
    # --------------------------------
    # convert FeatureCollection → geometry
    # --------------------------------
    if geojson_fc["type"] == "FeatureCollection":
        geometry = geojson_fc["features"][0]["geometry"]

    elif geojson_fc["type"] == "Feature":
        geometry = geojson_fc["geometry"]

    else:
        geometry = geojson_fc

    # --------------------------------
    # connect stac
    # --------------------------------
    catalog = Client.open(STAC_URL)

    search = catalog.search(
        collections=["sentinel-2-l2a"],
        intersects=geometry,
        datetime=f"{start_date}/{end_date}",
        query={
            "eo:cloud_cover": {
                "lte": max_cloud_cover
            }
        },
    )

    items = list(search.items())

    # --------------------------------
    # no results
    # --------------------------------
    if not items:
        return {
            "count": 0,
            "results": []
        }

    # --------------------------------
    # results
    # --------------------------------
    results = []

    for item in items:
        results.append(
            {
                "id": item.id,
                "date": item.datetime.date().isoformat(),
                "datetime": item.datetime.isoformat(),
                "cloud_cover": item.properties.get(
                    "eo:cloud_cover"
                ),
                "product": item.properties.get(
                    "s2:product_uri"
                ),
                "platform": item.properties.get(
                    "platform"
                ),
            }
        )

    # sort by date
    results = sorted(
        results,
        key=lambda x: x["date"]
    )

    return {
        "count": len(results),
        "results": results,
    }


# --------------------------------
# Example
# --------------------------------

geojson_fc = {
    "type": "FeatureCollection",
    "features": [
        {
            "type": "Feature",
            "properties": {},
            "geometry": {
                "type": "Polygon",
                "coordinates": [[
                    [76.8149, 17.6984],
                    [76.8131, 17.6972],
                    [76.8155, 17.6961],
                    [76.8172, 17.6976],
                    [76.8149, 17.6984],
                ]],
            },
        }
    ],
}

data = search_sentinel2(
    geojson_fc=geojson_fc,
    start_date="2025-12-01",
    end_date="2025-12-31",
    max_cloud_cover=15,
)

print(json.dumps(data, indent=2))

Example output

{
  "count": 2,
  "results": [
    {
      "id": "S2A_MSIL2A_20251212T052201",
      "date": "2025-12-12",
      "datetime": "2025-12-12T05:22:01+00:00",
      "cloud_cover": 8.2,
      "product": "S2A_MSIL2A_20251212T052201",
      "platform": "sentinel-2a"
    },
    {
      "id": "S2B_MSIL2A_20251222T052201",
      "date": "2025-12-22",
      "datetime": "2025-12-22T05:22:01+00:00",
      "cloud_cover": 12.6,
      "product": "S2B_MSIL2A_20251222T052201",
      "platform": "sentinel-2b"
    }
  ]
}

Result fields

Field Meaning
id Sentinel scene id
date image date
datetime exact capture time
cloud_cover cloud percentage
product product name
platform Sentinel satellite

Summary

Searching Sentinel-2 before downloading helps us:

  • find available imagery
  • filter low-cloud scenes
  • compare dates
  • select the best image

Typical workflow:

GeoJSON study area
Search Sentinel-2
Check cloud cover
Choose best scene
Load raster bands
Calculate NDVI / analysis

Download clipped raw Sentinel-2 bands

Sometimes we need the original satellite bands instead of calculated indices.

Raw bands are useful for:

  • NDVI / EVI calculation
  • RGB image creation
  • false color composites
  • land cover analysis
  • water detection
  • remote sensing workflows

In this example we will:

  • use a GeoJSON polygon
  • select one Sentinel-2 scene
  • load required bands
  • clip exactly to polygon
  • save each band as a separate GeoTIFF

This gives us raw clipped raster bands ready for GIS analysis.


Install required libraries

!pip install pystac-client planetary-computer odc-stac geopandas pandas rasterio rioxarray

Libraries used:

  • pystac-client → search STAC catalog
  • planetary-computer → sign URLs
  • odc-stac → load raster bands
  • rasterio → raster processing
  • rioxarray → clip + export

Download clipped raw bands

This function:

  • accepts GeoJSON polygon
  • accepts Sentinel-2 item id
  • accepts list of bands
  • clips raster
  • saves TIFF files
import os
import rasterio
from shapely.geometry import shape
from pystac_client import Client
import planetary_computer as pc
from odc.stac import load
import rioxarray


STAC_URL = "https://planetarycomputer.microsoft.com/api/stac/v1"


def download_clipped_raw_bands(
    geojson_data: dict,
    item_id: str,
    bands: list,
    output_dir: str,
):
    """
    Download clipped raw Sentinel-2 bands.

    Parameters
    ----------
    geojson_data : dict
        GeoJSON Polygon / Feature / FeatureCollection

    item_id : str
        Sentinel-2 item id

    bands : list
        Example:
        ["B02", "B03", "B04", "B08"]

    output_dir : str
        Folder to save TIFFs
    """

    # --------------------------
    # geometry
    # --------------------------
    if geojson_data["type"] == "FeatureCollection":
        geom_geojson = geojson_data["features"][0]["geometry"]

    elif geojson_data["type"] == "Feature":
        geom_geojson = geojson_data["geometry"]

    else:
        geom_geojson = geojson_data

    geom = shape(geom_geojson)

    # --------------------------
    # stac
    # --------------------------
    client = Client.open(
        STAC_URL,
        modifier=pc.sign_inplace,
    )

    search = client.search(
        collections=["sentinel-2-l2a"],
        ids=[item_id],
    )

    items = list(search.items())

    if not items:
        raise ValueError(
            f"No item found for {item_id}"
        )

    signed_items = [
        pc.sign(item)
        for item in items
    ]

    # --------------------------
    # load only required bands
    # --------------------------
    ds = load(
        items=signed_items,
        geopolygon=geom,
        groupby="solar_day",
        bands=bands,
    )

    # exact polygon clip
    ds = ds.rio.clip(
        [geom],
        crs="EPSG:4326",
    )

    # first image only
    ds = ds.isel(time=0)

    # create folder
    os.makedirs(
        output_dir,
        exist_ok=True,
    )

    saved_files = []

    # --------------------------
    # save each band
    # --------------------------
    for band in bands:

        output_path = os.path.join(
            output_dir,
            f"{band}.tif",
        )

        ds[band].rio.to_raster(
            output_path
        )

        saved_files.append(
            output_path
        )

    return {
        "status": "success",
        "item_id": item_id,
        "bands": bands,
        "files": saved_files,
    }


# -----------------------------------
# Example
# -----------------------------------

res = download_clipped_raw_bands(
    geojson_data=geojson_fc,
    item_id="S2C_MSIL2A_20251214T053241_R105_T43QDF_20251214T090710",
    bands=[
        "B02",
        "B03",
        "B04",
        "B08",
    ],
    output_dir="/content/raw_bands",
)

print(res)

Example output

{
    "status": "success",
    "item_id": "S2C_MSIL2A_20251214T053241_R105_T43QDF_20251214T090710",
    "bands": [
        "B02",
        "B03",
        "B04",
        "B08"
    ],
    "files": [
        "/content/raw_bands/B02.tif",
        "/content/raw_bands/B03.tif",
        "/content/raw_bands/B04.tif",
        "/content/raw_bands/B08.tif"
    ]
}

Common Sentinel-2 bands

Band Name Resolution Use
B02 Blue 10 m water / coast
B03 Green 10 m vegetation
B04 Red 10 m NDVI
B08 NIR 10 m vegetation
B11 SWIR 20 m moisture
B12 SWIR 20 m soil

Example band combinations

RGB true color:

B04 + B03 + B02

False color vegetation:

B08 + B04 + B03

NDVI:

B08 + B04

EVI:

B08 + B04 + B02

Parameters

Parameter Meaning
geojson_data study area polygon
item_id Sentinel scene id
bands list of bands
output_dir folder to save TIFFs

Workflow summary

GeoJSON study area
Search Sentinel-2
Choose item_id
Select bands
Load raster
Clip polygon
Save each TIFF

Result

Output files can be used in:

  • QGIS
  • Rasterio
  • NumPy
  • NDVI calculation
  • RGB rendering
  • remote sensing analysis

Download NDVI or EVI GeoTIFF from Sentinel-2

After searching available Sentinel-2 scenes, the next step is to download raster bands and calculate vegetation indices.

In this example we:

  • use a GeoJSON polygon as study area
  • select one Sentinel-2 scene using item_id
  • load required bands
  • calculate:
  • NDVI
  • or EVI
  • clip result exactly to polygon
  • save output as GeoTIFF

This workflow is useful for:

  • crop monitoring
  • vegetation health
  • remote sensing analysis
  • agricultural GIS workflows

Install required libraries

!pip install pystac-client planetary-computer odc-stac geopandas pandas rasterio rioxarray

Libraries used:

  • pystac-client → search STAC catalog
  • planetary-computer → sign asset URLs
  • odc-stac → load raster bands
  • rasterio → raster handling
  • rioxarray → clipping + export

Download clipped NDVI or EVI

This function:

  • accepts GeoJSON
  • accepts Sentinel-2 item id
  • calculates NDVI or EVI
  • clips raster
  • saves GeoTIFF
import numpy as np
import rasterio
from shapely.geometry import shape
from pystac_client import Client
import planetary_computer as pc
from odc.stac import load
import rioxarray


STAC_URL = "https://planetarycomputer.microsoft.com/api/stac/v1"


def download_clipped_index_tiff(
    geojson_data: dict,
    item_id: str,
    index: str,
    output_path: str,
):
    """
    Download clipped Sentinel-2 NDVI/EVI GeoTIFF.

    Parameters
    ----------
    geojson_data : dict
    item_id : str
    index : str ("ndvi" or "evi")
    output_path : str
    """

    # ---------------------------------
    # geometry
    # ---------------------------------
    if geojson_data["type"] == "FeatureCollection":
        geom_geojson = geojson_data["features"][0]["geometry"]
    elif geojson_data["type"] == "Feature":
        geom_geojson = geojson_data["geometry"]
    else:
        geom_geojson = geojson_data

    geom = shape(geom_geojson)

    # ---------------------------------
    # stac search by id
    # ---------------------------------
    client = Client.open(
        STAC_URL,
        modifier=pc.sign_inplace,
    )

    search = client.search(
        collections=["sentinel-2-l2a"],
        ids=[item_id],
    )

    items = list(search.items())

    if not items:
        raise ValueError(
            f"No item found for {item_id}"
        )

    signed_items = [
        pc.sign(item)
        for item in items
    ]

    # ---------------------------------
    # required bands
    # ---------------------------------
    bands = ["B04", "B08"]

    if index.lower() == "evi":
        bands.append("B02")

    # ---------------------------------
    # load clipped to polygon
    # ---------------------------------
    ds = load(
        items=signed_items,
        geopolygon=geom,
        groupby="solar_day",
        bands=bands,
    )

    # exact polygon clip
    ds = ds.rio.clip(
        [geom],
        crs="EPSG:4326",
    )

    # ---------------------------------
    # calculate index
    # ---------------------------------
    red = ds["B04"]
    nir = ds["B08"]

    if index.lower() == "ndvi":
        result = (
            (nir - red)
            / (nir + red + 1e-6)
        )

    elif index.lower() == "evi":
        blue = ds["B02"]

        result = (
            2.5
            * (nir - red)
            / (
                nir
                + 6 * red
                - 7.5 * blue
                + 1
            )
        )

    else:
        raise ValueError(
            "Supported: ndvi or evi"
        )

    # first image only
    result = result.isel(time=0)

    # ---------------------------------
    # save
    # ---------------------------------
    result.rio.to_raster(output_path)

    return {
        "status": "success",
        "item_id": item_id,
        "index": index,
        "output": output_path,
    }


# -------------------------------------
# Example
# -------------------------------------

res = download_clipped_index_tiff(
    geojson_data=geojson_fc,
    item_id="S2C_MSIL2A_20251204T053221_R105_T43QDF_20251204T091915",
    index="ndvi",
    output_path="/content/ndvi_clip.tif",
)

print(res)

Example output

{
    "status": "success",
    "item_id": "S2C_MSIL2A_20251204T053221_R105_T43QDF_20251204T091915",
    "index": "ndvi",
    "output": "/content/ndvi_clip.tif"
}

Supported indices

Index Bands used Formula
NDVI NIR + Red (NIR - Red) / (NIR + Red)
EVI NIR + Red + Blue 2.5 × (NIR - Red) / (NIR + 6×Red - 7.5×Blue + 1)

Sentinel-2 bands used

Band Name Resolution
B02 Blue 10 m
B04 Red 10 m
B08 Near Infrared 10 m

Parameters

Parameter Meaning
geojson_data study area polygon
item_id Sentinel-2 scene id
index "ndvi" or "evi"
output_path output GeoTIFF location

Workflow summary

GeoJSON study area
Search Sentinel-2
Choose item_id
Load bands
Calculate NDVI / EVI
Clip to polygon
Save GeoTIFF

Result

The output file is a clipped raster GeoTIFF.

It can be used in:

  • QGIS
  • Rasterio
  • GeoPandas workflows
  • agricultural analysis
  • vegetation monitoring

Download LULC GeoTIFF from Sentinel-2

Along with downloading raster bands and vegetation indices, we can also create a simple Land Use / Land Cover (LULC) raster.

LULC helps classify land into categories such as:

  • water
  • built-up area
  • barren/open soil
  • sparse vegetation
  • dense vegetation

In this example we:

  • use a GeoJSON polygon
  • search Sentinel-2 scenes
  • filter by date and cloud cover
  • create a cloud-free composite
  • calculate indices
  • classify land cover
  • clip exactly to polygon
  • save output as GeoTIFF

This workflow is useful for:

  • land use mapping
  • agriculture monitoring
  • vegetation analysis
  • urban growth
  • GIS raster classification

Install required libraries

!pip install pystac-client planetary-computer odc-stac geopandas pandas rasterio rioxarray scipy

Libraries used:

  • pystac-client → search STAC
  • planetary-computer → sign imagery
  • odc-stac → load raster bands
  • geopandas → polygon handling
  • rasterio → save raster
  • rioxarray → clipping
  • scipy → smoothing filter

Download LULC raster

This function:

  • searches Sentinel-2
  • removes clouds
  • creates median composite
  • calculates:
  • NDVI
  • NDBI
  • MNDWI
  • classifies pixels
  • saves LULC GeoTIFF
import numpy as np
import rasterio
import geopandas as gpd
from shapely.geometry import shape
from rasterio.mask import mask as rio_mask
from pystac_client import Client
import planetary_computer as pc
from odc.stac import load
from scipy.ndimage import median_filter


STAC_URL = "https://planetarycomputer.microsoft.com/api/stac/v1"


def download_lulc_tiff(
    geojson_data: dict,
    date_range: str,
    cloud_cover: int,
    output_path: str,
    max_items: int = 10,
    resolution: int = 20,
):
    """
    Download clipped Sentinel-2 LULC GeoTIFF.

    Classes
    -------
    0 -> Water
    1 -> Built-up
    2 -> Barren/Open soil
    3 -> Sparse vegetation
    4 -> Dense vegetation
    255 -> NoData
    """

    # --------------------------------
    # geometry
    # --------------------------------
    if geojson_data["type"] == "FeatureCollection":
        geom_geojson = geojson_data["features"][0]["geometry"]

    elif geojson_data["type"] == "Feature":
        geom_geojson = geojson_data["geometry"]

    else:
        geom_geojson = geojson_data

    geom = shape(geom_geojson)

    gdf = gpd.GeoDataFrame(
        geometry=[geom],
        crs="EPSG:4326",
    )

    bbox = tuple(gdf.total_bounds)

    utm_crs = gdf.estimate_utm_crs()

    # --------------------------------
    # search stac
    # --------------------------------
    client = Client.open(STAC_URL)

    search = client.search(
        collections=["sentinel-2-l2a"],
        intersects=geom.__geo_interface__,
        datetime=date_range,
        query={
            "eo:cloud_cover": {
                "lt": cloud_cover
            }
        },
    )

    items = sorted(
        search.items(),
        key=lambda x: x.properties.get(
            "eo:cloud_cover",
            100,
        ),
    )[:max_items]

    if not items:
        raise ValueError(
            "No scenes found"
        )

    signed_items = [
        pc.sign(item)
        for item in items
    ]

    # --------------------------------
    # load bands
    # --------------------------------
    ds = load(
        items=signed_items,
        bands=[
            "B02",
            "B03",
            "B04",
            "B08",
            "B11",
            "SCL",
        ],
        crs=utm_crs,
        resolution=resolution,
        bbox=bbox,
        groupby="solar_day",
    )

    # --------------------------------
    # cloud mask
    # --------------------------------
    scl = ds["SCL"]

    cloud_mask = scl.isin(
        [1, 3, 8, 9, 10, 11]
    )

    blue = ds["B02"].where(
        ~cloud_mask
    ).astype("float32")

    green = ds["B03"].where(
        ~cloud_mask
    ).astype("float32")

    red = ds["B04"].where(
        ~cloud_mask
    ).astype("float32")

    nir = ds["B08"].where(
        ~cloud_mask
    ).astype("float32")

    swir = ds["B11"].where(
        ~cloud_mask
    ).astype("float32")

    # median composite
    blue = blue.median(dim="time")
    green = green.median(dim="time")
    red = red.median(dim="time")
    nir = nir.median(dim="time")
    swir = swir.median(dim="time")

    # --------------------------------
    # indices
    # --------------------------------
    def safe_index(a, b):
        denom = a + b

        return np.where(
            denom == 0,
            np.nan,
            (a - b) / denom,
        )

    b = blue.values
    g = green.values
    r = red.values
    n = nir.values
    s = swir.values

    ndvi = safe_index(n, r)

    ndbi = safe_index(s, n)

    mndwi = safe_index(g, s)

    # --------------------------------
    # classification
    # --------------------------------
    lulc = np.full(
        ndvi.shape,
        255,
        dtype=np.uint8,
    )

    valid = (
        np.isfinite(ndvi)
        & np.isfinite(ndbi)
        & np.isfinite(mndwi)
    )

    water = (
        valid
        & (mndwi > 0.1)
        & (mndwi > ndvi)
    )
    lulc[water] = 0

    dense_veg = (
        valid
        & ~water
        & (ndvi > 0.5)
    )
    lulc[dense_veg] = 4

    sparse_veg = (
        valid
        & ~water
        & ~dense_veg
        & (ndvi >= 0.2)
        & (ndvi <= 0.5)
    )
    lulc[sparse_veg] = 3

    builtup = (
        valid
        & ~water
        & ~dense_veg
        & ~sparse_veg
        & (ndbi > 0)
        & (ndvi < 0.2)
    )

    lulc[builtup] = 1

    barren = (
        valid
        & (lulc == 255)
    )

    lulc[barren] = 2

    # --------------------------------
    # smoothing
    # --------------------------------
    filtered = median_filter(
        lulc,
        size=3,
    )

    filtered[
        lulc == 255
    ] = 255

    lulc = filtered

    # --------------------------------
    # save temp
    # --------------------------------
    geobox = ds.odc.geobox

    temp_path = (
        output_path
        + ".tmp.tif"
    )

    with rasterio.open(
        temp_path,
        "w",
        driver="GTiff",
        height=lulc.shape[0],
        width=lulc.shape[1],
        count=1,
        dtype=rasterio.uint8,
        crs=geobox.crs,
        transform=geobox.transform,
        nodata=255,
    ) as dst:
        dst.write(
            lulc,
            1,
        )

    # --------------------------------
    # exact clip
    # --------------------------------
    gdf_utm = gdf.to_crs(
        utm_crs
    )

    geom_utm = (
        gdf_utm.geometry.values[0]
    )

    with rasterio.open(
        temp_path
    ) as src:

        clipped, transform = rio_mask(
            src,
            [geom_utm.__geo_interface__],
            crop=True,
            nodata=255,
        )

        meta = src.meta.copy()

    meta.update(
        {
            "height": clipped.shape[1],
            "width": clipped.shape[2],
            "transform": transform,
            "nodata": 255,
        }
    )

    with rasterio.open(
        output_path,
        "w",
        **meta,
    ) as dst:
        dst.write(
            clipped[0],
            1,
        )

    return {
        "status": "success",
        "date_range": date_range,
        "cloud_cover": cloud_cover,
        "output": output_path,
        "classes": {
            0: "Water",
            1: "Built-up",
            2: "Barren",
            3: "Sparse vegetation",
            4: "Dense vegetation",
            255: "NoData",
        },
    }

Example

geojson_fc = {
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "properties": {},
      "geometry": {
        "type": "Polygon",
        "coordinates": [[
          [73.658151, 20.034861],
          [73.658151, 19.977499],
          [73.771634, 19.977499],
          [73.771634, 20.034861],
          [73.658151, 20.034861]
        ]]
      }
    }
  ]
}

res = download_lulc_tiff(
    geojson_data=geojson_fc,
    date_range="2025-01-01/2025-03-31",
    cloud_cover=20,
    output_path="/content/lulc_2025.tif",
)

print(res)

LULC classes

Value Class
0 Water
1 Built-up
2 Barren/Open soil
3 Sparse vegetation
4 Dense vegetation
255 NoData

Indices used

Index Purpose
NDVI vegetation
NDBI built-up
MNDWI water

Workflow summary

GeoJSON polygon
Search Sentinel-2
Filter cloud cover
Load bands
Cloud mask
Median composite
Calculate indices
Classify pixels
Clip polygon
Save GeoTIFF

Result

Output can be opened in:

  • QGIS
  • Rasterio
  • GIS software

Useful for:

  • land cover mapping
  • agriculture
  • vegetation monitoring
  • urban analysis

What is Rasterio?

Rasterio is a Python library used to work with raster geospatial data such as satellite images, elevation maps, land cover maps, and other grid-based datasets.

Raster data stores information in the form of rows and columns of pixels, where each pixel contains a value. These values may represent things like color, height, temperature, rainfall, or vegetation.

Rasterio provides a simple Python interface to open, read, analyze, and write raster files. It is built on top of GDAL and is designed specifically for geospatial raster processing.

With Rasterio we can:

  • read raster files such as GeoTIFF
  • check raster metadata
  • access pixel values
  • crop rasters
  • mask rasters using vector boundaries
  • save processed raster files

Rasterio works well with:

  • NumPy for raster calculations
  • GeoPandas for vector data
  • Shapely for geometry operations

In simple terms:

  • Shapely works with geometry
  • GeoPandas works with vector layers
  • Rasterio works with raster grids

Rasterio is widely used in GIS and remote sensing for tasks such as:

  • satellite image analysis
  • digital elevation model processing
  • raster clipping
  • land use classification
  • environmental monitoring

In short:

Rasterio is a Python GIS library used to read, analyze, and write raster geospatial data.

1. Open and read a raster file

import rasterio

path = "assets/tiff/Tiff_1.tif"  # bundled practice GeoTIFF
with rasterio.open(path) as src:
    band1 = src.read(1)          # 2D NumPy array, first band (height × width)
    profile = src.profile        # driver, dtype, nodata, transform, crs, size
    crs_epsg = src.crs.to_epsg() if src.crs else None
    crs_label = f"EPSG:{crs_epsg}" if crs_epsg else str(src.crs)
    print(src.shape, crs_label, src.dtypes[0])

Example output for Tiff_1.tif:

(692, 1663) EPSG:4326 float32

read(1) loads band 1 into memory; for large files you can use windows or out_shape later to read only part of a band.

2. What are “raster statistics”?

Statistics here are numeric summaries of the pixel values in a band (or in a region you care about): minimum, maximum, mean (average), standard deviation, percentiles, counts of valid pixels, and so on. They answer questions like “how high is this DEM on average?” or “what range do reflectance values span?” You usually ignore NoData when computing them so missing cells do not skew the result.

3. Calculate statistics (one band, respect NoData)

import numpy as np
import rasterio

with rasterio.open("assets/tiff/Tiff_1.tif") as src:
    arr = src.read(1).astype("float64")
    nodata = src.nodata

    if nodata is not None:
        arr = np.where(arr == nodata, np.nan, arr)

# Basic stats
print("Min:", np.nanmin(arr))
print("Max:", np.nanmax(arr))
print("Mean:", np.nanmean(arr))
print("Median:", np.nanmedian(arr))
print("Std Dev:", np.nanstd(arr))
print("Variance:", np.nanvar(arr))

# Percentiles
print("25th Percentile:", np.nanpercentile(arr, 25))
print("50th Percentile (Median):", np.nanpercentile(arr, 50))
print("75th Percentile:", np.nanpercentile(arr, 75))

# Count info
print("Total Pixels:", arr.size)
print("Valid Pixels:", np.count_nonzero(~np.isnan(arr)))
print("NoData Pixels:", np.count_nonzero(np.isnan(arr)))

Example output when run on the bundled Tiff_1.tif:

Min: -0.11472345888614655
Max: 0.6778202652931213
Mean: 0.24052893210664816
Median: 0.22149499505758286
Std Dev: 0.13968990308740697
Variance: 0.019513269024569155
25th Percentile: 0.15612491592764854
50th Percentile (Median): 0.22149499505758286
75th Percentile: 0.32683808356523514
Total Pixels: 1150796
Valid Pixels: 1150796
NoData Pixels: 0

4. Filter pixels by value (keep cells in a range, mask the rest)

Build a boolean mask, then either write a new GeoTIFF or set unwanted pixels to NoData before saving.

import numpy as np
import rasterio
from rasterio.enums import Resampling

with rasterio.open("/content/Tiff_1.tif") as src:
    arr = src.read(1).astype("float32")
    profile = src.profile.copy()
    mask = (arr >= 0.1) & (arr <= 0.5)   # keep elevations 100–500 (example)
    out = np.where(mask, arr, src.nodata).astype(profile["dtype"])

profile.update(dtype=out.dtype, count=1, compress="deflate")
with rasterio.open("/content/Tiff_1_filter.tif", "w", **profile) as dst:
    dst.write(out, 1)

Example visualization (bundled Tiff_1.tif: band 1 before masking, then after keeping only the middle value range and setting the rest to NoData—the same idea as the code above):

Tiff_OG — original raster before the value filter

Tiff_filter — raster after filtering; pixels outside the kept range are masked

Adjust thresholds to your units; always set nodata in profile if you use a sentinel.

5. Clip a raster using a polygon

rasterio.mask.mask expects an iterable of GeoJSON-like geometries: plain Python dicts with "type" and "coordinates" (same structure as .geojson on disk). The polygon’s coordinate order must match the raster’s CRS (for example lon/lat if the DEM is EPSG:4326; if the DEM is in metres, use projected corners instead, or reproject the polygon first).

Polygon inline as GeoJSON (dict)

import rasterio
from rasterio.mask import mask

# GeoJSON Polygon: outer ring must close (first point = last point)
clip_poly = {
    "type": "Polygon",
    "coordinates": [
        [
            [
              73.71042841436719,
              20.09023103896574
            ],
            [
              73.69665791730623,
              20.074091366211377
            ],
            [
              73.71214563805651,
              20.063933096227032
            ],
            [
              73.7259168550645,
              20.066013154915694
            ],
            [
              73.72492563666324,
              20.077312229882054
            ],
            [
              73.72123577059406,
              20.087440203990184
            ],
            [
              73.71042841436719,
              20.09023103896574
            ]
        ]
    ],
}
geom = [clip_poly]

with rasterio.open("/content/Tiff_1.tif") as src:
    clipped, transform = mask(src, geom, crop=True, nodata=src.nodata)
    meta = src.meta.copy()
    meta.update({"height": clipped.shape[1], "width": clipped.shape[2], "transform": transform})

with rasterio.open("/content/Tiff_1_clipped.tif", "w", **meta) as dst:
    dst.write(clipped)

Example visualization (clipped, cropped raster after mask(..., crop=True) on Tiff_1.tif with the polygon above):

Tiff_clipped — output extent and pixels inside the clip polygon

6. Merge two rasters into one mosaic

rasterio.merge.merge aligns inputs on a common grid and pastes them into a larger array (same CRS and compatible bands work best; otherwise reproject first).

import rasterio
from rasterio.merge import merge

src_files = ["/content/Tiff_1.tif", "/content/Tiff_2.tif"]
srcs = [rasterio.open(p) for p in src_files]
mosaic, out_transform = merge(srcs)

out_meta = srcs[0].meta.copy()
out_meta.update({"height": mosaic.shape[1], "width": mosaic.shape[2], "transform": out_transform})
for s in srcs:
    s.close()

with rasterio.open("/content/Tiff_merge.tif", "w", **out_meta) as dst:
    dst.write(mosaic)

Example visualization (mosaic from Tiff_1.tif and Tiff_2.tif merged with rasterio.merge.merge):

Tiff_merge — combined extent after merging the two GeoTIFFs

7. Resampling (resize resolution)

Resampling changes how pixels are aggregated or interpolated when you change the grid size (fewer or more rows/columns) while keeping (or defining) the same geographic extent. Downsampling uses fewer pixels (coarser resolution); upsampling uses more pixels (finer resolution). Pick a method that matches your data: average or mode often suit downsampling continuous or categorical rasters; bilinear is common for smooth continuous surfaces; nearest preserves class codes when upsampling labels.

Read at a new resolution, then write a GeoTIFF with an updated affine transform so the file stays correctly georeferenced:

import rasterio
from rasterio.enums import Resampling
from rasterio.transform import Affine

src_path = "/content/Tiff_1.tif"
dst_path = "/content/Tiff_1_resampled.tif"
factor = 2  # >1 = downsample, <1 = upsample

with rasterio.open(src_path) as src:
    # New dimensions
    new_height = int(src.height / factor)
    new_width = int(src.width / factor)

    # Read & resample
    data = src.read(
        out_shape=(src.count, new_height, new_width),
        resampling=Resampling.average
    )

    # Update transform
    transform = src.transform * Affine.scale(
        src.width / new_width,
        src.height / new_height
    )

    # Update metadata
    profile = src.profile
    profile.update({
        "height": new_height,
        "width": new_width,
        "transform": transform
    })

# Save output
with rasterio.open(dst_path, "w", **profile) as dst:
    dst.write(data)

print(f"Resampled: {src.height}x{src.width}{new_height}x{new_width}")

Example visualization (pixel grid before and after resampling Tiff_1.tif with a downsampling factor, same idea as the code above):

pixel_before_resampling — full-resolution raster before resampling

pixel_after_resampling — raster after resampling to a coarser grid

For upsampling (more pixels), set factor between 0 and 1 (for example factor = 0.5 doubles rows and columns) and consider Resampling.bilinear instead of average. For warping to another CRS or bounds, use rasterio.warp.reproject with a destination array and transform from calculate_default_transform (covered in more detail later in this module).

Basics assignment: GeoTIFFs & rasterio

These tasks recap the short recipes above: open a raster, read metadata and arrays, summarize values, plot one band, filter by value, merge two tiles, resample, and sanity-check in a viewer. Use the bundled assets/tiff/Tiff_1.tif and assets/tiff/Tiff_2.tif (see the Sample GeoTIFFs download buttons in Raster file formats). On Colab, copy the files to /content/ and change paths accordingly.

Paths

From a notebook whose working directory is the repo root, assets/tiff/Tiff_1.tif resolves like the rest of this chapter. If you run from docs/, use assets/tiff/... the same way as in 05_visualization.md.

  1. Open and describe — With rasterio.open, print src.shape, src.crs, src.dtypes, and src.bounds for Tiff_1.tif. In one sentence, say whether shape is (bands, height, width) or (height, width) in your rasterio version and what that implies for read(1).

  2. Band statistics — Read band 1 into a NumPy array (float). If src.nodata is set, mask it to np.nan before stats. Print min, max, mean (using np.nanmin / np.nanmax / np.nanmean as needed).

  3. Quick map — Plot band 1 with matplotlib.pyplot.imshow, passing extent=[left, right, bottom, top] from src.bounds and origin="upper". Label x / y with the axis names implied by your CRS (e.g. lon/lat for EPSG:4326).

  4. Value filter — Build a boolean mask that keeps pixels between the 10th and 90th percentile of the band (see § Filter pixels). Write a new GeoTIFF Tiff_1_filtered.tif (or any output name your instructor specifies) with the same CRS/transform logic as the recipe.

  5. Merge — Use rasterio.merge.merge on Tiff_1.tif and Tiff_2.tif. Save Tiff_merge_lab.tif. Print the mosaic shape and confirm out_transform differs from either source’s transform.

  6. Resample — Downsample Tiff_1.tif by factor 2 using read(..., out_shape=..., resampling=Resampling.average) (or the Affine-scaling pattern in § Resampling). Save Tiff_1_half.tif and print before vs after width and height.

  7. Optional clip — Use rasterio.mask.mask with the inline GeoJSON box from § Clip a raster (or your own polygon in WGS 84 that intersects the raster). Save Tiff_1_clipped_lab.tif.

  8. Visual check — Upload Tiff_1.tif (or your merged output) to the Pozyx Online GeoTIFF Viewer or open in QGIS. Note one thing you see that imshow alone did not emphasize (e.g. basemap context, legend, scale).

Submit your output GeoTIFFs (or paths in a shared drive), your .py or .ipynb, and short answers for any “explain” prompts your instructor assigns.


Advance Analytics

The sections below use a longer teaching workflow (environment setup through practice problems). If you have not worked through the Basics assignment yet, consider doing that first so rasterio, numpy, and matplotlib patterns are fresh.

Setting Up the Environment

import rasterio
import rasterio.plot
import rasterio.mask
import numpy as np
import matplotlib.pyplot as plt
import geopandas as gpd
from rasterio.warp import calculate_default_transform, reproject, Resampling
from rasterio.windows import from_bounds
import warnings
warnings.filterwarnings('ignore')

# Set up plotting
plt.style.use('default')

# Data directory path - change this if your TIFF files are in a different folder
DATA_DIR = 'data/'

# Load available raster data
dem_path = DATA_DIR + 'dem.tif'
b04_path = DATA_DIR + 'B04.tif'  # Red band
b08_path = DATA_DIR + 'B08.tif'  # NIR band
true_color_path = DATA_DIR + 'true_color.tiff'

print("=== LOADING RASTER DATA ===")
print(f"DEM: {dem_path}")
print(f"Red Band (B04): {b04_path}")
print(f"NIR Band (B08): {b08_path}")
print(f"True Color: {true_color_path}")

1. Understanding Raster Structure

Raster Components Deep Dive

graph LR
    A[Raster Dataset] --> B[Metadata]
    A --> C[Data Array]
    B --> D[CRS]
    B --> E[Transform]
    B --> F[Dimensions]
    B --> G[NoData Value]
    C --> H[Pixel Values]
    C --> I[Data Type]

Metadata contains crucial information about the raster:

  • CRS (Coordinate Reference System): Defines the spatial reference
  • Transform: Mathematical relationship between pixel coordinates and geographic coordinates
  • Dimensions: Number of rows, columns, and bands
  • NoData Value: Represents missing or invalid data

Data Array holds the actual pixel values:

  • Data Type: Integer (int16, int32) or floating-point (float32, float64)
  • Bit Depth: Determines value range (8-bit: 0-255, 16-bit: 0-65535)
  • Signed vs Unsigned: Whether negative values are allowed

Understanding the Affine Transform

The affine transform is a 6-parameter mathematical transformation that converts pixel coordinates to geographic coordinates:

| x |   | a  b  c | | col |
| y | = | d  e  f | |  row |
| 1 |   | 0  0  1 | |  1  |

Where:

  • a: Pixel width (x-direction)
  • b: Row rotation (usually 0)
  • c: X-coordinate of upper-left corner
  • d: Column rotation (usually 0)
  • e: Pixel height (y-direction, usually negative)
  • f: Y-coordinate of upper-left corner

Pixel Indexing and Coordinates

Pixel Coordinates: Array indices (row, column) starting from (0,0) at top-left

Geographic Coordinates: Real-world coordinates (X, Y) in the raster's CRS

Center vs Corner: Pixels can be referenced by their center point or corner

Data Types and Memory Considerations

Data Type Range Memory per Pixel Best Use
uint8 0-255 1 byte Classified data, RGB images
int16 -32,768 to 32,767 2 bytes Elevation, temperature
uint16 0-65,535 2 bytes Satellite imagery
float32 ±3.4 × 10³⁸ 4 bytes Calculated values, ratios
float64 ±1.8 × 10³⁰⁸ 8 bytes High-precision calculations

Loading Real Raster Data

# Load the DEM (Digital Elevation Model)
with rasterio.open(dem_path) as dem_src:
    elevation_data = dem_src.read(1)  # Read first band
    dem_transform = dem_src.transform
    dem_crs = dem_src.crs
    dem_bounds = dem_src.bounds
    dem_nodata = dem_src.nodata

print("=== DEM DATA PROPERTIES ===")
print(f"Shape: {elevation_data.shape}")
print(f"Data type: {elevation_data.dtype}")
print(f"CRS: {dem_crs}")
print(f"Bounds: {dem_bounds}")
print(f"NoData value: {dem_nodata}")

# Handle NoData values
if dem_nodata is not None:
    elevation_masked = np.ma.masked_equal(elevation_data, dem_nodata)
    print(f"Valid pixels: {elevation_masked.count():,}")
    print(f"NoData pixels: {elevation_masked.mask.sum():,}")
    # Use masked array for statistics
    print(f"Min elevation: {elevation_masked.min():.1f}m")
    print(f"Max elevation: {elevation_masked.max():.1f}m")
    print(f"Mean elevation: {elevation_masked.mean():.1f}m")
else:
    print(f"Min elevation: {elevation_data.min():.1f}m")
    print(f"Max elevation: {elevation_data.max():.1f}m")
    print(f"Mean elevation: {elevation_data.mean():.1f}m")

Raster Metadata and Transform

# Use the actual transform from the DEM
transform = dem_transform
bounds = dem_bounds
height, width = elevation_data.shape

print("=== RASTER SPATIAL PROPERTIES ===")
print(f"Bounds: {bounds}")
print(f"Transform: {transform}")
print(f"Pixel size X: {transform[0]:.4f}")
print(f"Pixel size Y: {abs(transform[4]):.4f}")
print(f"Width: {width} pixels")
print(f"Height: {height} pixels")

# Calculate pixel coordinates
def pixel_to_coord(row, col, transform):
    """Convert pixel coordinates to geographic coordinates"""
    x = transform[0] * col + transform[1] * row + transform[2]
    y = transform[3] * col + transform[4] * row + transform[5]
    return x, y

# Example: center pixel coordinates
center_row, center_col = height // 2, width // 2
center_x, center_y = pixel_to_coord(center_row, center_col, transform)
print(f"Center pixel ({center_row}, {center_col}) = ({center_x:.2f}, {center_y:.2f})")

# Calculate extent for plotting
extent = [bounds.left, bounds.right, bounds.bottom, bounds.top]
print(f"Plot extent: {extent}")

2. Visualizing Raster Data

Basic Raster Visualization

# Create comprehensive visualization
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# 1. Basic elevation map
im1 = axes[0,0].imshow(elevation_data, cmap='terrain', extent=bounds)
axes[0,0].set_title('Elevation Map')
axes[0,0].set_xlabel('Longitude')
axes[0,0].set_ylabel('Latitude')
plt.colorbar(im1, ax=axes[0,0], label='Elevation (m)')

# 2. Hillshade effect
from matplotlib.colors import LightSource
ls = LightSource(azdeg=315, altdeg=45)
hillshade = ls.hillshade(elevation_data, vert_exag=2)
axes[0,1].imshow(hillshade, cmap='gray', extent=bounds)
axes[0,1].set_title('Hillshade')
axes[0,1].set_xlabel('Longitude')
axes[0,1].set_ylabel('Latitude')

# 3. Elevation histogram
axes[1,0].hist(elevation_data.flatten(), bins=50, alpha=0.7, color='brown')
axes[1,0].set_title('Elevation Distribution')
axes[1,0].set_xlabel('Elevation (m)')
axes[1,0].set_ylabel('Frequency')
axes[1,0].grid(True, alpha=0.3)

# 4. Contour lines
contour = axes[1,1].contour(elevation_data, levels=10, extent=bounds, colors='black', alpha=0.6)
axes[1,1].clabel(contour, inline=True, fontsize=8)
im4 = axes[1,1].imshow(elevation_data, cmap='terrain', extent=bounds, alpha=0.7)
axes[1,1].set_title('Elevation with Contours')
axes[1,1].set_xlabel('Longitude')
axes[1,1].set_ylabel('Latitude')

plt.tight_layout()
plt.show()

Advanced Visualization Techniques

# Create elevation classes
def classify_elevation(elevation, breaks):
    """Classify elevation into categories"""
    classified = np.zeros_like(elevation, dtype=np.int32)
    for i, break_val in enumerate(breaks[1:], 1):
        classified[elevation <= break_val] = i
    return classified

# Define elevation breaks
elevation_breaks = [0, 200, 500, 800, 1200, 2000]
elevation_classes = classify_elevation(elevation_data, elevation_breaks)

# Create custom colormap
colors = ['#2E8B57', '#9ACD32', '#DAA520', '#CD853F', '#A0522D', '#FFFFFF']
from matplotlib.colors import ListedColormap
custom_cmap = ListedColormap(colors[:len(elevation_breaks)-1])

# Plot classified elevation
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Original continuous elevation
im1 = ax1.imshow(elevation_data, cmap='terrain', extent=bounds)
ax1.set_title('Continuous Elevation')
plt.colorbar(im1, ax=ax1, label='Elevation (m)')

# Classified elevation
im2 = ax2.imshow(elevation_classes, cmap=custom_cmap, extent=bounds)
ax2.set_title('Classified Elevation')
cbar = plt.colorbar(im2, ax=ax2, ticks=range(len(elevation_breaks)-1))
cbar.set_ticklabels([f'{elevation_breaks[i]}-{elevation_breaks[i+1]}m' 
                     for i in range(len(elevation_breaks)-1)])

plt.tight_layout()
plt.show()

3. Raster Calculations and Statistics

Basic Statistics

# Calculate comprehensive statistics
def raster_statistics(data, name="Raster"):
    """Calculate comprehensive raster statistics"""
    # Remove any potential NoData values (assuming -9999)
    valid_data = data[data != -9999]

    stats = {
        'count': valid_data.size,
        'min': valid_data.min(),
        'max': valid_data.max(),
        'mean': valid_data.mean(),
        'median': np.median(valid_data),
        'std': valid_data.std(),
        'range': valid_data.max() - valid_data.min()
    }

    print(f"=== {name.upper()} STATISTICS ===")
    for key, value in stats.items():
        if key == 'count':
            print(f"{key.capitalize()}: {value:,}")
        else:
            print(f"{key.capitalize()}: {value:.2f}")

    return stats

# Calculate statistics for our elevation data
elev_stats = raster_statistics(elevation_data, "Elevation")

# Percentiles
percentiles = [10, 25, 50, 75, 90, 95, 99]
elev_percentiles = np.percentile(elevation_data, percentiles)

print(f"\n=== ELEVATION PERCENTILES ===")
for p, val in zip(percentiles, elev_percentiles):
    print(f"{p}th percentile: {val:.1f}m")

Raster Math Operations

# Demonstrate various raster calculations
print("=== RASTER CALCULATIONS ===")

# 1. Convert elevation to feet
elevation_feet = elevation_data * 3.28084
print(f"Elevation in feet - Min: {elevation_feet.min():.1f}, Max: {elevation_feet.max():.1f}")

# 2. Calculate slope (simplified)
def calculate_slope(elevation, pixel_size):
    """Calculate slope using gradient"""
    grad_y, grad_x = np.gradient(elevation, pixel_size)
    slope_radians = np.arctan(np.sqrt(grad_x**2 + grad_y**2))
    slope_degrees = np.degrees(slope_radians)
    return slope_degrees

pixel_size = abs(transform[0])  # Assuming square pixels
slope_data = calculate_slope(elevation_data, pixel_size)

print(f"Slope - Min: {slope_data.min():.1f}°, Max: {slope_data.max():.1f}°, Mean: {slope_data.mean():.1f}°")

# 3. Create binary mask (high elevation areas)
high_elevation_mask = elevation_data > np.percentile(elevation_data, 75)
high_elevation_area = np.sum(high_elevation_mask) * (pixel_size ** 2)

print(f"High elevation areas (>75th percentile): {high_elevation_area:.2f} square units")

# 4. Normalize elevation (0-1 scale)
elevation_normalized = (elevation_data - elevation_data.min()) / (elevation_data.max() - elevation_data.min())
print(f"Normalized elevation - Min: {elevation_normalized.min():.3f}, Max: {elevation_normalized.max():.3f}")

Visualizing Calculated Rasters

# Visualize calculated rasters
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Original elevation
im1 = axes[0,0].imshow(elevation_data, cmap='terrain', extent=bounds)
axes[0,0].set_title('Original Elevation')
plt.colorbar(im1, ax=axes[0,0], label='Elevation (m)')

# Slope
im2 = axes[0,1].imshow(slope_data, cmap='Reds', extent=bounds)
axes[0,1].set_title('Slope')
plt.colorbar(im2, ax=axes[0,1], label='Slope (degrees)')

# High elevation mask
im3 = axes[1,0].imshow(high_elevation_mask, cmap='RdYlBu_r', extent=bounds)
axes[1,0].set_title('High Elevation Areas (>75th percentile)')
plt.colorbar(im3, ax=axes[1,0], label='High Elevation')

# Normalized elevation
im4 = axes[1,1].imshow(elevation_normalized, cmap='viridis', extent=bounds)
axes[1,1].set_title('Normalized Elevation (0-1)')
plt.colorbar(im4, ax=axes[1,1], label='Normalized Value')

plt.tight_layout()
plt.show()

4. Working with Vector Boundaries

Creating Sample Vector Data

# Create sample vector boundaries for clipping
from shapely.geometry import Polygon, Point
import geopandas as gpd

# Create sample study areas
study_areas = [
    {'name': 'Area A', 'geometry': Polygon([(-5, -5), (0, -5), (0, 0), (-5, 0), (-5, -5)])},
    {'name': 'Area B', 'geometry': Polygon([(2, 2), (8, 2), (8, 8), (2, 8), (2, 2)])},
    {'name': 'Circular Area', 'geometry': Point(3, -3).buffer(3)}
]

study_gdf = gpd.GeoDataFrame(study_areas, crs='EPSG:4326')

print("=== STUDY AREAS ===")
print(study_gdf)

# Visualize study areas with elevation
fig, ax = plt.subplots(1, 1, figsize=(12, 10))

# Plot elevation as background
im = ax.imshow(elevation_data, cmap='terrain', extent=bounds, alpha=0.7)
plt.colorbar(im, ax=ax, label='Elevation (m)')

# Plot study areas
study_gdf.plot(ax=ax, facecolor='none', edgecolor='red', linewidth=3, alpha=0.8)

# Add labels
for idx, row in study_gdf.iterrows():
    centroid = row.geometry.centroid
    ax.annotate(row['name'], (centroid.x, centroid.y), 
                ha='center', va='center', fontsize=12, fontweight='bold',
                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8))

ax.set_title('Study Areas on Elevation Map')
ax.set_xlabel('Longitude')
ax.set_ylabel('Latitude')
plt.tight_layout()
plt.show()

Clipping Rasters with Vector Boundaries

def clip_raster_with_polygon(raster_data, transform, polygon, bounds):
    """Clip raster data using a polygon boundary"""
    from rasterio.features import geometry_mask

    # Create mask from polygon
    mask = geometry_mask([polygon], transform=transform, 
                        invert=True, out_shape=raster_data.shape)

    # Apply mask
    clipped_data = raster_data.copy()
    clipped_data[~mask] = np.nan  # Set areas outside polygon to NaN

    return clipped_data, mask

# Clip elevation data for each study area
clipped_results = {}

for idx, area in study_gdf.iterrows():
    clipped_elev, mask = clip_raster_with_polygon(
        elevation_data, transform, area.geometry, bounds
    )
    clipped_results[area['name']] = {
        'data': clipped_elev,
        'mask': mask,
        'geometry': area.geometry
    }

# Visualize clipped results
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
axes = axes.flatten()

# Original elevation
im0 = axes[0].imshow(elevation_data, cmap='terrain', extent=bounds)
study_gdf.plot(ax=axes[0], facecolor='none', edgecolor='red', linewidth=2)
axes[0].set_title('Original Elevation with Study Areas')
plt.colorbar(im0, ax=axes[0], label='Elevation (m)')

# Clipped areas
for i, (name, result) in enumerate(clipped_results.items(), 1):
    if i < 4:  # Only plot first 3 clipped areas
        im = axes[i].imshow(result['data'], cmap='terrain', extent=bounds)
        axes[i].set_title(f'Clipped Elevation - {name}')
        plt.colorbar(im, ax=axes[i], label='Elevation (m)')

plt.tight_layout()
plt.show()

Statistics for Clipped Areas

# Calculate statistics for each clipped area
print("=== CLIPPED AREA STATISTICS ===")

for name, result in clipped_results.items():
    clipped_data = result['data']
    valid_data = clipped_data[~np.isnan(clipped_data)]

    if len(valid_data) > 0:
        print(f"\n{name}:")
        print(f"  Valid pixels: {len(valid_data):,}")
        print(f"  Min elevation: {valid_data.min():.1f}m")
        print(f"  Max elevation: {valid_data.max():.1f}m")
        print(f"  Mean elevation: {valid_data.mean():.1f}m")
        print(f"  Std deviation: {valid_data.std():.1f}m")

        # Calculate area (assuming each pixel represents area)
        pixel_area = (pixel_size ** 2)
        total_area = len(valid_data) * pixel_area
        print(f"  Total area: {total_area:.2f} square units")
    else:
        print(f"\n{name}: No valid data")

5. Raster-Vector Integration

Extracting Values at Points

# Create sample point locations
sample_points = [
    {'name': 'Peak A', 'geometry': Point(-2, 3)},
    {'name': 'Valley B', 'geometry': Point(4, -2)},
    {'name': 'Ridge C', 'geometry': Point(-6, -1)},
    {'name': 'Plain D', 'geometry': Point(7, 5)}
]

points_gdf = gpd.GeoDataFrame(sample_points, crs='EPSG:4326')

def extract_raster_values_at_points(raster_data, transform, points_gdf):
    """Extract raster values at point locations"""
    from rasterio.transform import rowcol

    extracted_values = []

    for idx, point in points_gdf.iterrows():
        # Convert geographic coordinates to pixel coordinates
        row, col = rowcol(transform, point.geometry.x, point.geometry.y)

        # Check if point is within raster bounds
        if 0 <= row < raster_data.shape[0] and 0 <= col < raster_data.shape[1]:
            value = raster_data[row, col]
            extracted_values.append(value)
        else:
            extracted_values.append(np.nan)

    return extracted_values

# Extract elevation values at points
elevation_at_points = extract_raster_values_at_points(elevation_data, transform, points_gdf)
slope_at_points = extract_raster_values_at_points(slope_data, transform, points_gdf)

# Add values to GeoDataFrame
points_gdf['elevation'] = elevation_at_points
points_gdf['slope'] = slope_at_points

print("=== ELEVATION AND SLOPE AT SAMPLE POINTS ===")
print(points_gdf[['name', 'elevation', 'slope']])

# Visualize points on raster
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# Elevation with points
im1 = ax1.imshow(elevation_data, cmap='terrain', extent=bounds)
points_gdf.plot(ax=ax1, color='red', markersize=100, marker='*', edgecolor='black', linewidth=1)
for idx, point in points_gdf.iterrows():
    ax1.annotate(f"{point['name']}\n{point['elevation']:.0f}m", 
                (point.geometry.x, point.geometry.y),
                xytext=(10, 10), textcoords='offset points',
                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8),
                fontsize=8)
ax1.set_title('Elevation Values at Sample Points')
plt.colorbar(im1, ax=ax1, label='Elevation (m)')

# Slope with points
im2 = ax2.imshow(slope_data, cmap='Reds', extent=bounds)
points_gdf.plot(ax=ax2, color='blue', markersize=100, marker='*', edgecolor='black', linewidth=1)
for idx, point in points_gdf.iterrows():
    ax2.annotate(f"{point['name']}\n{point['slope']:.1f}°", 
                (point.geometry.x, point.geometry.y),
                xytext=(10, 10), textcoords='offset points',
                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8),
                fontsize=8)
ax2.set_title('Slope Values at Sample Points')
plt.colorbar(im2, ax=ax2, label='Slope (degrees)')

plt.tight_layout()
plt.show()

Zonal Statistics

def calculate_zonal_statistics(raster_data, zones_gdf, transform):
    """Calculate statistics for each zone (polygon)"""
    from rasterio.features import geometry_mask

    results = []

    for idx, zone in zones_gdf.iterrows():
        # Create mask for this zone
        mask = geometry_mask([zone.geometry], transform=transform,
                           invert=True, out_shape=raster_data.shape)

        # Extract values within zone
        zone_values = raster_data[mask]
        zone_values = zone_values[~np.isnan(zone_values)]  # Remove NaN values

        if len(zone_values) > 0:
            stats = {
                'zone_name': zone.get('name', f'Zone_{idx}'),
                'count': len(zone_values),
                'min': zone_values.min(),
                'max': zone_values.max(),
                'mean': zone_values.mean(),
                'median': np.median(zone_values),
                'std': zone_values.std(),
                'sum': zone_values.sum()
            }
        else:
            stats = {
                'zone_name': zone.get('name', f'Zone_{idx}'),
                'count': 0,
                'min': np.nan,
                'max': np.nan,
                'mean': np.nan,
                'median': np.nan,
                'std': np.nan,
                'sum': np.nan
            }

        results.append(stats)

    return pd.DataFrame(results)

# Calculate zonal statistics for elevation
zonal_stats = calculate_zonal_statistics(elevation_data, study_gdf, transform)

print("=== ZONAL STATISTICS FOR ELEVATION ===")
print(zonal_stats.round(2))

# Calculate zonal statistics for slope
slope_zonal_stats = calculate_zonal_statistics(slope_data, study_gdf, transform)

print("\n=== ZONAL STATISTICS FOR SLOPE ===")
print(slope_zonal_stats.round(2))

7. Multi-band Raster Analysis

NDVI Calculation from Two Bands

NDVI (Normalized Difference Vegetation Index) is calculated using Near-Infrared (NIR) and Red bands to assess vegetation health.

# Load the satellite bands
with rasterio.open(b08_path) as nir_src:
    nir_data = nir_src.read(1).astype(float)  # NIR band (B08)
    nir_transform = nir_src.transform
    nir_bounds = nir_src.bounds
    nir_nodata = nir_src.nodata

with rasterio.open(b04_path) as red_src:
    red_data = red_src.read(1).astype(float)  # Red band (B04)
    red_transform = red_src.transform
    red_bounds = red_src.bounds
    red_nodata = red_src.nodata

print("=== SATELLITE BAND DATA ===")
print(f"NIR Band shape: {nir_data.shape}")
print(f"Red Band shape: {red_data.shape}")
print(f"NIR range: {nir_data.min():.0f} - {nir_data.max():.0f}")
print(f"Red range: {red_data.min():.0f} - {red_data.max():.0f}")
print(f"NIR NoData: {nir_nodata}")
print(f"Red NoData: {red_nodata}")

# Handle NoData values and convert to reflectance (0-1 scale)
if nir_nodata is not None:
    nir_data = np.where(nir_data == nir_nodata, np.nan, nir_data)
if red_nodata is not None:
    red_data = np.where(red_data == red_nodata, np.nan, red_data)

# Convert to reflectance if needed (assuming values are in 0-10000 range)
if nir_data.max() > 1:
    nir_data = nir_data / 10000.0
if red_data.max() > 1:
    red_data = red_data / 10000.0

print(f"NIR reflectance range: {np.nanmin(nir_data):.3f} - {np.nanmax(nir_data):.3f}")
print(f"Red reflectance range: {np.nanmin(red_data):.3f} - {np.nanmax(red_data):.3f}")

# Calculate NDVI
def calculate_ndvi(nir, red):
    """Calculate NDVI from NIR and Red bands"""
    # Avoid division by zero
    denominator = nir + red
    ndvi = np.where(denominator != 0, (nir - red) / denominator, np.nan)
    return ndvi

ndvi = calculate_ndvi(nir_data, red_data)

print(f"\nNDVI range: {np.nanmin(ndvi):.3f} - {np.nanmax(ndvi):.3f}")
print(f"Mean NDVI: {np.nanmean(ndvi):.3f}")

# Calculate extent for plotting
band_extent = [nir_bounds.left, nir_bounds.right, nir_bounds.bottom, nir_bounds.top]

# Visualize bands and NDVI
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# NIR Band
im1 = axes[0,0].imshow(nir_data, cmap='Reds', extent=band_extent)
axes[0,0].set_title('NIR Band (B08)')
plt.colorbar(im1, ax=axes[0,0], label='Reflectance')

# Red Band
im2 = axes[0,1].imshow(red_data, cmap='Reds', extent=band_extent)
axes[0,1].set_title('Red Band (B04)')
plt.colorbar(im2, ax=axes[0,1], label='Reflectance')

# NDVI
im3 = axes[1,0].imshow(ndvi, cmap='RdYlGn', vmin=-1, vmax=1, extent=band_extent)
axes[1,0].set_title('NDVI (Vegetation Index)')
plt.colorbar(im3, ax=axes[1,0], label='NDVI Value')

# NDVI Classification
ndvi_classes = np.where(ndvi < 0, 0,      # Water/bare soil
                np.where(ndvi < 0.2, 1,   # Low vegetation
                np.where(ndvi < 0.5, 2,   # Moderate vegetation
                         3)))             # High vegetation

class_colors = ['blue', 'brown', 'yellow', 'green']
class_labels = ['Water/Bare', 'Low Veg', 'Moderate Veg', 'High Veg']
from matplotlib.colors import ListedColormap
class_cmap = ListedColormap(class_colors)

im4 = axes[1,1].imshow(ndvi_classes, cmap=class_cmap, extent=band_extent)
axes[1,1].set_title('NDVI Classification')
cbar = plt.colorbar(im4, ax=axes[1,1], ticks=[0, 1, 2, 3])
cbar.set_ticklabels(class_labels)

plt.tight_layout()
plt.show()

# NDVI Statistics
print("\n=== NDVI ANALYSIS ===")
for i, label in enumerate(class_labels):
    pixel_count = np.sum(ndvi_classes == i)
    percentage = (pixel_count / ndvi_classes.size) * 100
    print(f"{label}: {pixel_count:,} pixels ({percentage:.1f}%)")

RGB Visualization from Three Bands

True Color Composite uses Red, Green, and Blue bands to create natural-looking images.

# Load the true color composite
with rasterio.open(true_color_path) as rgb_src:
    rgb_data = rgb_src.read()  # Read all bands
    rgb_transform = rgb_src.transform
    rgb_bounds = rgb_src.bounds
    rgb_nodata = rgb_src.nodata

print("=== TRUE COLOR COMPOSITE DATA ===")
print(f"RGB data shape: {rgb_data.shape}")
print(f"Number of bands: {rgb_data.shape[0]}")
print(f"Image dimensions: {rgb_data.shape[1]} x {rgb_data.shape[2]}")
print(f"Data type: {rgb_data.dtype}")
print(f"NoData value: {rgb_nodata}")

# Handle the data for visualization
if rgb_data.shape[0] >= 3:
    # Extract RGB bands (assuming order is R, G, B)
    red_band = rgb_data[0].astype(float)
    green_band = rgb_data[1].astype(float)
    blue_band = rgb_data[2].astype(float)

    # Handle NoData values
    if rgb_nodata is not None:
        red_band = np.where(red_band == rgb_nodata, np.nan, red_band)
        green_band = np.where(green_band == rgb_nodata, np.nan, green_band)
        blue_band = np.where(blue_band == rgb_nodata, np.nan, blue_band)

    print(f"Red band range: {np.nanmin(red_band):.0f} - {np.nanmax(red_band):.0f}")
    print(f"Green band range: {np.nanmin(green_band):.0f} - {np.nanmax(green_band):.0f}")
    print(f"Blue band range: {np.nanmin(blue_band):.0f} - {np.nanmax(blue_band):.0f}")

    # Create RGB composite
    def create_rgb_composite(red, green, blue, enhance=True):
        """Create RGB composite from three bands"""
        # Normalize to 0-1 range if needed
        if red.max() > 1:
            red = red / np.nanmax(red)
        if green.max() > 1:
            green = green / np.nanmax(green)
        if blue.max() > 1:
            blue = blue / np.nanmax(blue)

        # Stack bands into 3D array
        rgb = np.dstack((red, green, blue))

        if enhance:
            # Enhance contrast (stretch to full 0-1 range)
            for i in range(3):
                band = rgb[:, :, i]
                band_min, band_max = np.nanmin(band), np.nanmax(band)
                if band_max > band_min:
                    rgb[:, :, i] = (band - band_min) / (band_max - band_min)

        return rgb

    # Create composites
    rgb_natural = create_rgb_composite(red_band, green_band, blue_band, enhance=False)
    rgb_enhanced = create_rgb_composite(red_band, green_band, blue_band, enhance=True)

    # Calculate extent for plotting
    rgb_extent = [rgb_bounds.left, rgb_bounds.right, rgb_bounds.bottom, rgb_bounds.top]

    # Visualize individual bands and composites
    fig, axes = plt.subplots(2, 3, figsize=(18, 12))

    # Individual bands
    im1 = axes[0,0].imshow(red_band, cmap='Reds', extent=rgb_extent)
    axes[0,0].set_title('Red Band')
    plt.colorbar(im1, ax=axes[0,0], label='Digital Number')

    im2 = axes[0,1].imshow(green_band, cmap='Greens', extent=rgb_extent)
    axes[0,1].set_title('Green Band')
    plt.colorbar(im2, ax=axes[0,1], label='Digital Number')

    im3 = axes[0,2].imshow(blue_band, cmap='Blues', extent=rgb_extent)
    axes[0,2].set_title('Blue Band')
    plt.colorbar(im3, ax=axes[0,2], label='Digital Number')

    # RGB composites
    axes[1,0].imshow(rgb_natural, extent=rgb_extent)
    axes[1,0].set_title('Natural Color Composite')

    axes[1,1].imshow(rgb_enhanced, extent=rgb_extent)
    axes[1,1].set_title('Enhanced Color Composite')

    # False color composite (NIR-Red-Green) if NIR data is available
    if 'nir_data' in locals() and nir_data.shape == red_band.shape:
        false_color = create_rgb_composite(nir_data, red_band, green_band)
        axes[1,2].imshow(false_color, extent=rgb_extent)
        axes[1,2].set_title('False Color Composite\n(NIR-Red-Green)')
    else:
        axes[1,2].text(0.5, 0.5, 'NIR data not available\nfor false color composite', 
                      ha='center', va='center', transform=axes[1,2].transAxes)
        axes[1,2].set_title('False Color Composite\n(Not Available)')

    plt.tight_layout()
    plt.show()

    # Band statistics
    print("\n=== BAND STATISTICS ===")
    bands = {'Red': red_band, 'Green': green_band, 'Blue': blue_band}
    if 'nir_data' in locals():
        bands['NIR'] = nir_data

    for band_name, band_data in bands.items():
        print(f"{band_name} Band:")
        print(f"  Mean: {np.nanmean(band_data):.1f}")
        print(f"  Std:  {np.nanstd(band_data):.1f}")
        print(f"  Min:  {np.nanmin(band_data):.1f}")
        print(f"  Max:  {np.nanmax(band_data):.1f}")
else:
    print("Error: True color composite does not have enough bands for RGB visualization")

8. Handling NoData Values

# Create sample data with NoData values
elevation_with_nodata = elevation_data.copy()

# Introduce some NoData areas (simulate missing data)
# Set random patches to NoData value (-9999)
nodata_value = -9999
np.random.seed(42)  # For reproducible results

# Create random NoData patches
for _ in range(5):
    center_row = np.random.randint(20, elevation_data.shape[0] - 20)
    center_col = np.random.randint(20, elevation_data.shape[1] - 20)
    size = np.random.randint(5, 15)

    elevation_with_nodata[
        center_row-size:center_row+size,
        center_col-size:center_col+size
    ] = nodata_value

print("=== NODATA HANDLING ===")
print(f"Original data shape: {elevation_data.shape}")
print(f"NoData value: {nodata_value}")
print(f"Number of NoData pixels: {np.sum(elevation_with_nodata == nodata_value)}")
print(f"Percentage NoData: {np.sum(elevation_with_nodata == nodata_value) / elevation_with_nodata.size * 100:.1f}%")

# Create masked array for proper handling
elevation_masked = np.ma.masked_equal(elevation_with_nodata, nodata_value)

print(f"Valid data points: {elevation_masked.count()}")
print(f"Masked data points: {elevation_masked.mask.sum()}")

NoData Visualization and Statistics

# Visualize data with NoData
fig, axes = plt.subplots(1, 3, figsize=(18, 6))

# Original data
im1 = axes[0].imshow(elevation_data, cmap='terrain', extent=bounds)
axes[0].set_title('Original Elevation Data')
plt.colorbar(im1, ax=axes[0], label='Elevation (m)')

# Data with NoData (showing NoData as white)
elevation_display = elevation_with_nodata.copy()
elevation_display[elevation_display == nodata_value] = np.nan
im2 = axes[1].imshow(elevation_display, cmap='terrain', extent=bounds)
axes[1].set_title('Elevation Data with NoData (white areas)')
plt.colorbar(im2, ax=axes[1], label='Elevation (m)')

# NoData mask only
nodata_mask = elevation_with_nodata == nodata_value
im3 = axes[2].imshow(nodata_mask, cmap='RdYlBu_r', extent=bounds)
axes[2].set_title('NoData Mask (red = NoData)')
plt.colorbar(im3, ax=axes[2], label='NoData')

plt.tight_layout()
plt.show()

# Statistics comparison
print("\n=== STATISTICS COMPARISON ===")
print("Original data:")
raster_statistics(elevation_data, "Original")

print("\nData with NoData (excluding NoData values):")
valid_data = elevation_with_nodata[elevation_with_nodata != nodata_value]
raster_statistics(valid_data, "Valid Data Only")

Filling NoData Values

# Demonstrate different NoData filling methods
from scipy import ndimage

def fill_nodata_methods(data, nodata_value):
    """Demonstrate different methods to fill NoData values"""
    # Create mask
    mask = data == nodata_value
    filled_data = data.copy().astype(float)
    filled_data[mask] = np.nan

    methods = {}

    # Method 1: Fill with mean
    mean_filled = filled_data.copy()
    mean_value = np.nanmean(filled_data)
    mean_filled[np.isnan(mean_filled)] = mean_value
    methods['Mean Fill'] = mean_filled

    # Method 2: Fill with median
    median_filled = filled_data.copy()
    median_value = np.nanmedian(filled_data)
    median_filled[np.isnan(median_filled)] = median_value
    methods['Median Fill'] = median_filled

    # Method 3: Interpolation (simple nearest neighbor)
    from scipy.interpolate import griddata

    # Get coordinates of valid and invalid points
    rows, cols = np.mgrid[0:data.shape[0], 0:data.shape[1]]
    valid_mask = ~np.isnan(filled_data)

    if np.sum(valid_mask) > 0:
        # Interpolate
        interpolated = griddata(
            (rows[valid_mask], cols[valid_mask]),
            filled_data[valid_mask],
            (rows, cols),
            method='nearest'
        )
        methods['Nearest Neighbor'] = interpolated

    return methods

# Apply different filling methods
filled_methods = fill_nodata_methods(elevation_with_nodata, nodata_value)

# Visualize different filling methods
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
axes = axes.flatten()

# Original with NoData
elevation_display = elevation_with_nodata.copy()
elevation_display[elevation_display == nodata_value] = np.nan
im0 = axes[0].imshow(elevation_display, cmap='terrain', extent=bounds)
axes[0].set_title('Original with NoData')
plt.colorbar(im0, ax=axes[0], label='Elevation (m)')

# Different filling methods
for i, (method_name, filled_data) in enumerate(filled_methods.items(), 1):
    if i < 4:
        im = axes[i].imshow(filled_data, cmap='terrain', extent=bounds)
        axes[i].set_title(f'{method_name}')
        plt.colorbar(im, ax=axes[i], label='Elevation (m)')

plt.tight_layout()
plt.show()

Practice Problems

Problem 1: Elevation Analysis

Perform comprehensive elevation analysis:

# TODO:
# 1. Create elevation zones (low, medium, high)
# 2. Calculate the area of each elevation zone
# 3. Find the steepest areas (top 10% slope)
# 4. Create a suitability map combining elevation and slope
# 5. Generate summary statistics

# Your code here
Solution
# 1. Create elevation zones
elevation_percentiles = np.percentile(elevation_data, [33, 67])
elevation_zones = np.zeros_like(elevation_data, dtype=int)
elevation_zones[elevation_data <= elevation_percentiles[0]] = 1  # Low
elevation_zones[(elevation_data > elevation_percentiles[0]) & 
               (elevation_data <= elevation_percentiles[1])] = 2  # Medium
elevation_zones[elevation_data > elevation_percentiles[1]] = 3  # High

# 2. Calculate area of each zone
pixel_area = (pixel_size ** 2)
zone_areas = {}
zone_names = {1: 'Low', 2: 'Medium', 3: 'High'}

for zone_id, zone_name in zone_names.items():
    zone_pixels = np.sum(elevation_zones == zone_id)
    zone_area = zone_pixels * pixel_area
    zone_areas[zone_name] = zone_area
    print(f"{zone_name} elevation zone: {zone_pixels:,} pixels, {zone_area:.2f} sq units")

# 3. Find steepest areas (top 10% slope)
slope_threshold = np.percentile(slope_data, 90)
steep_areas = slope_data > slope_threshold
steep_area_total = np.sum(steep_areas) * pixel_area

print(f"\nSteepest areas (>{slope_threshold:.1f}°): {steep_area_total:.2f} sq units")

# 4. Create suitability map (example: suitable = medium elevation + gentle slope)
gentle_slope = slope_data < np.percentile(slope_data, 50)  # Bottom 50% slope
medium_elevation = elevation_zones == 2
suitability = (medium_elevation & gentle_slope).astype(int)

suitable_area = np.sum(suitability) * pixel_area
print(f"Suitable areas (medium elevation + gentle slope): {suitable_area:.2f} sq units")

# 5. Visualize results
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Elevation zones
zone_colors = ['white', 'green', 'yellow', 'red']
zone_cmap = ListedColormap(zone_colors)
im1 = axes[0,0].imshow(elevation_zones, cmap=zone_cmap, extent=bounds)
axes[0,0].set_title('Elevation Zones')

# Steep areas
im2 = axes[0,1].imshow(steep_areas, cmap='Reds', extent=bounds)
axes[0,1].set_title('Steepest Areas (Top 10%)')

# Suitability map
im3 = axes[1,0].imshow(suitability, cmap='RdYlGn', extent=bounds)
axes[1,0].set_title('Suitability Map')

# Combined visualization
axes[1,1].imshow(elevation_data, cmap='terrain', extent=bounds, alpha=0.7)
axes[1,1].contour(slope_data, levels=[slope_threshold], colors='red', linewidths=2)
axes[1,1].set_title('Elevation with Steep Area Contours')

plt.tight_layout()
plt.show()

Problem 2: Multi-Criteria Analysis

Combine multiple raster layers for analysis:

# TODO:
# 1. Create a temperature raster (simulate climate data)
# 2. Create a distance-to-water raster (simulate accessibility)
# 3. Combine elevation, slope, temperature, and distance for habitat suitability
# 4. Apply different weights to each factor
# 5. Identify the top 20% most suitable areas

# Your code here
Solution
# 1. Create temperature raster (simulate)
x = np.linspace(-10, 10, elevation_data.shape[1])
y = np.linspace(-10, 10, elevation_data.shape[0])
X, Y = np.meshgrid(x, y)

# Temperature decreases with elevation and varies with latitude
temperature_data = (
    25 - (elevation_data / 100) +  # Temperature lapse rate
    5 * np.sin(Y / 5) +            # Latitudinal variation
    3 * np.random.random(elevation_data.shape)  # Random variation
)

# 2. Create distance-to-water raster (simulate rivers/lakes)
# Create some "water bodies" as low elevation areas
water_bodies = elevation_data < np.percentile(elevation_data, 20)

# Calculate distance to nearest water
from scipy.ndimage import distance_transform_edt
distance_to_water = distance_transform_edt(~water_bodies) * pixel_size

print("=== SIMULATED DATA CREATED ===")
print(f"Temperature range: {temperature_data.min():.1f}°C to {temperature_data.max():.1f}°C")
print(f"Distance to water range: {distance_to_water.min():.1f} to {distance_to_water.max():.1f} units")

# 3. Normalize all factors to 0-1 scale for combination
def normalize_raster(data):
    return (data - data.min()) / (data.max() - data.min())

# Normalize factors (some need to be inverted for suitability)
elev_norm = 1 - normalize_raster(np.abs(elevation_data - np.median(elevation_data)))  # Prefer medium elevation
slope_norm = 1 - normalize_raster(slope_data)  # Prefer gentle slopes
temp_norm = 1 - normalize_raster(np.abs(temperature_data - 20))  # Prefer ~20°C
water_norm = 1 - normalize_raster(distance_to_water)  # Prefer close to water

# 4. Apply weights and combine
weights = {
    'elevation': 0.3,
    'slope': 0.3,
    'temperature': 0.2,
    'water_distance': 0.2
}

habitat_suitability = (
    weights['elevation'] * elev_norm +
    weights['slope'] * slope_norm +
    weights['temperature'] * temp_norm +
    weights['water_distance'] * water_norm
)

# 5. Identify top 20% most suitable areas
suitability_threshold = np.percentile(habitat_suitability, 80)
top_suitable = habitat_suitability > suitability_threshold

suitable_area = np.sum(top_suitable) * pixel_area
total_area = habitat_suitability.size * pixel_area

print(f"\n=== HABITAT SUITABILITY RESULTS ===")
print(f"Suitability threshold (80th percentile): {suitability_threshold:.3f}")
print(f"Top suitable area: {suitable_area:.2f} sq units ({suitable_area/total_area*100:.1f}% of total)")

# Visualize all factors and result
fig, axes = plt.subplots(3, 2, figsize=(15, 18))

# Individual factors
im1 = axes[0,0].imshow(elevation_data, cmap='terrain', extent=bounds)
axes[0,0].set_title('Elevation')
plt.colorbar(im1, ax=axes[0,0])

im2 = axes[0,1].imshow(slope_data, cmap='Reds', extent=bounds)
axes[0,1].set_title('Slope')
plt.colorbar(im2, ax=axes[0,1])

im3 = axes[1,0].imshow(temperature_data, cmap='coolwarm', extent=bounds)
axes[1,0].set_title('Temperature')
plt.colorbar(im3, ax=axes[1,0])

im4 = axes[1,1].imshow(distance_to_water, cmap='Blues_r', extent=bounds)
axes[1,1].set_title('Distance to Water')
plt.colorbar(im4, ax=axes[1,1])

# Combined suitability
im5 = axes[2,0].imshow(habitat_suitability, cmap='RdYlGn', extent=bounds)
axes[2,0].set_title('Habitat Suitability (Combined)')
plt.colorbar(im5, ax=axes[2,0])

# Top suitable areas
im6 = axes[2,1].imshow(top_suitable, cmap='RdYlGn', extent=bounds)
axes[2,1].set_title('Top 20% Most Suitable Areas')
plt.colorbar(im6, ax=axes[2,1])

plt.tight_layout()
plt.show()

Problem 3: Raster Time Series Analysis

Simulate and analyze change over time:

# TODO:
# 1. Create 3 time periods of elevation data (simulate erosion/deposition)
# 2. Calculate elevation change between periods
# 3. Identify areas of significant change
# 4. Calculate change statistics
# 5. Visualize the temporal changes

# Your code here
Solution
# 1. Create time series data (simulate erosion/deposition)
np.random.seed(42)

# Base elevation (time 1)
elevation_t1 = elevation_data.copy()

# Time 2: Add some erosion and deposition
erosion_areas = slope_data > np.percentile(slope_data, 70)  # Steep areas erode
deposition_areas = slope_data < np.percentile(slope_data, 30)  # Flat areas accumulate

elevation_t2 = elevation_t1.copy()
elevation_t2[erosion_areas] -= np.random.uniform(5, 20, np.sum(erosion_areas))
elevation_t2[deposition_areas] += np.random.uniform(2, 10, np.sum(deposition_areas))

# Time 3: Continue the process
elevation_t3 = elevation_t2.copy()
elevation_t3[erosion_areas] -= np.random.uniform(3, 15, np.sum(erosion_areas))
elevation_t3[deposition_areas] += np.random.uniform(1, 8, np.sum(deposition_areas))

# 2. Calculate elevation changes
change_t1_t2 = elevation_t2 - elevation_t1
change_t2_t3 = elevation_t3 - elevation_t2
change_total = elevation_t3 - elevation_t1

# 3. Identify significant changes (>10m change)
significant_change = np.abs(change_total) > 10
erosion_areas_final = change_total < -10
deposition_areas_final = change_total > 10

# 4. Calculate statistics
print("=== ELEVATION CHANGE ANALYSIS ===")
print(f"Period 1-2 change: {change_t1_t2.min():.1f} to {change_t1_t2.max():.1f}m")
print(f"Period 2-3 change: {change_t2_t3.min():.1f} to {change_t2_t3.max():.1f}m")
print(f"Total change: {change_total.min():.1f} to {change_total.max():.1f}m")

print(f"\nAreas with significant change (>10m): {np.sum(significant_change):,} pixels")
print(f"Erosion areas (>10m loss): {np.sum(erosion_areas_final):,} pixels")
print(f"Deposition areas (>10m gain): {np.sum(deposition_areas_final):,} pixels")

# Calculate volumes
pixel_area = pixel_size ** 2
erosion_volume = np.sum(change_total[erosion_areas_final]) * pixel_area
deposition_volume = np.sum(change_total[deposition_areas_final]) * pixel_area

print(f"\nErosion volume: {abs(erosion_volume):,.0f} cubic units")
print(f"Deposition volume: {deposition_volume:,.0f} cubic units")
print(f"Net change: {erosion_volume + deposition_volume:,.0f} cubic units")

# 5. Visualize temporal changes
fig, axes = plt.subplots(3, 3, figsize=(18, 15))

# Time series elevations
for i, (elev_data, title) in enumerate([(elevation_t1, 'Time 1'), 
                                       (elevation_t2, 'Time 2'), 
                                       (elevation_t3, 'Time 3')]):
    im = axes[0,i].imshow(elev_data, cmap='terrain', extent=bounds)
    axes[0,i].set_title(title)
    plt.colorbar(im, ax=axes[0,i], label='Elevation (m)')

# Change maps
change_data = [change_t1_t2, change_t2_t3, change_total]
change_titles = ['Change T1-T2', 'Change T2-T3', 'Total Change']

for i, (change, title) in enumerate(zip(change_data, change_titles)):
    im = axes[1,i].imshow(change, cmap='RdBu_r', extent=bounds, 
                         vmin=-30, vmax=30)
    axes[1,i].set_title(title)
    plt.colorbar(im, ax=axes[1,i], label='Elevation Change (m)')

# Analysis results
im7 = axes[2,0].imshow(significant_change, cmap='Reds', extent=bounds)
axes[2,0].set_title('Significant Change Areas (>10m)')
plt.colorbar(im7, ax=axes[2,0])

im8 = axes[2,1].imshow(erosion_areas_final, cmap='Reds', extent=bounds)
axes[2,1].set_title('Major Erosion Areas')
plt.colorbar(im8, ax=axes[2,1])

im9 = axes[2,2].imshow(deposition_areas_final, cmap='Blues', extent=bounds)
axes[2,2].set_title('Major Deposition Areas')
plt.colorbar(im9, ax=axes[2,2])

plt.tight_layout()
plt.show()

# Create change summary
change_summary = pd.DataFrame({
    'Metric': ['Mean Change', 'Max Erosion', 'Max Deposition', 
               'Std Deviation', 'Erosion Pixels', 'Deposition Pixels'],
    'Value': [change_total.mean(), change_total.min(), change_total.max(),
              change_total.std(), np.sum(erosion_areas_final), np.sum(deposition_areas_final)],
    'Unit': ['m', 'm', 'm', 'm', 'pixels', 'pixels']
})

print("\n=== CHANGE SUMMARY TABLE ===")
print(change_summary.round(2))

Key Takeaways

mindmap
  root((Raster Analysis))
    Data Structure
      Grid Cells
      Resolution
      Extent
      NoData Values
    Operations
      Math Operations
      Statistics
      Classification
      Filtering
    Vector Integration
      Clipping
      Zonal Statistics
      Point Extraction
      Overlay Analysis
    Quality Control
      NoData Handling
      Data Validation
      Error Checking

What You've Learned

  • Raster Structure: Understanding grids, resolution, and spatial reference
  • Data Analysis: Statistics, calculations, and transformations
  • Vector Integration: Combining raster and vector data effectively
  • Quality Control: Handling missing data and validation
  • Visualization: Creating meaningful maps and charts
  • Practical Applications: Real-world analysis workflows

Best Practices

  • Always check raster metadata before analysis
  • Handle NoData values appropriately
  • Use appropriate data types to save memory
  • Validate results with visualizations
  • Document your analysis parameters
  • Consider computational efficiency for large datasets

Next Steps

In the next module, we'll create interactive visualizations and web applications: - Interactive maps with Folium - Dashboard creation with Streamlit - Combining raster and vector visualizations - User interface design for geospatial applications

graph LR
    A[Raster Analysis] --> B[Visualization]
    B --> C[Interactive Maps]
    B --> D[Web Applications]
    B --> E[Dashboards]