# Set the workspace to the current working directory
# Uncomment and replace the path below with your own working directory if needed:
#setwd("/Users/zhaoxiang/Desktop/raster_preparation")
<- getwd() # Get the current working directory and store it in 'workspace'
workspace
# Increase the plot size by adjusting the options for plot dimensions in the notebook output
options(repr.plot.width = 16, repr.plot.height = 8) # Sets width to 16 and height to 8 for larger plots
Species Distribution Analysis - Environmental Data Preparation (Raster)
Author details: Xiang Zhao
Editor details: Dr Sebastian Lopez Marcano
Contact details: support@ecocommons.org.au
Copyright statement: This script is the product of the EcoCommons platform. Please refer to the EcoCommons website for more details: https://www.ecocommons.org.au/
Date: December 2024
Script and data info:
This notebook, developed by the EcoCommons team, showcases how to download, and process both continuous and categorical environmental datasets. The datasets are in the raster format and these are generally used to build Species Distribution Models (SDMs)
Introduction
Raster data is a type of geo-spatial data represented by a grid of cells or pixels. Each cell contains a value representing information, such as temperature, elevation, land cover, or other spatially continuous variables. Raster data is commonly used in Geographic Information Systems (GIS) for analysing and visualizing spatial phenomena.
Objectives
Workflow Overview:
Set the working directory and load the necessary R packages (
dismo
,ggplot2
,raster
,googledrive
,sp
,dplyr
,terra
). Create directories to store raw data files.Data Download: Download continuous environmental and categorical environmental dataset (raster files) from our Google Drive.
Data Processing: Reproject the environmental dataset to the CRS (Coordinate Reference System) you desire / require. Crop and mask the environmental datasets to the extent of your study area. Resample the environmental datasets to a same resolution.
In the near future, this material may form part of comprehensive support materials available to EcoCommons users.
If you have any corrections or suggestions to improve the effeciengy, please contact the EcoCommons team.
Introduction
1. Environmental variables
In Species Distribution Modeling (SDM), environmental variables are factors that influence where a species can live. These include:
Climatic Variables: Temperature, precipitation.
Topographic Variables: Elevation, slope.
Soil Variables: Soil, pH, texture.
Vegetation Variables: Land cover type, canopy cover.
Hydrological Variables: Distance to water bodies, soil moisture.
Biotic Variables: Presence of prey, competitors.
Anthropogenic Variables: Land use, human impact.
These variables help explain and predict a species’ habitat suitability based on environmental conditions.
2. Raster
Raster data is a type of spatial data used to represent continuous surfaces, like environmental layers (e.g., elevation, temperature) in grids or cells. Each cell (pixel) has a value that represents information about that area.
Important Characteristics:
Resolution: Size of each cell, determining data detail (e.g., 10m x 10m cells).
Extent: Geographic area covered by the raster.
Coordinate Reference System (CRS): Defines the spatial reference of the raster for location accuracy.
Data Type: Can be continuous (e.g., temperature) or categorical (e.g., land cover type).
These characteristics impact how raster data is interpreted and used in analyses like SDM.
To extract values from raster data or perform spatial analyses, it is crucial that all rasters have the same resolution, extent, and coordinate reference system (CRS). This ensures that the data aligns correctly and avoids mismatches or errors during analysis.
Ideally, you should define a single CRS to be used consistently across all spatial files in a project, including shapefiles and raster files. This uniformity simplifies processing and ensures that all spatial data layers are accurately overlaid and compared.
Read “What is raster data?” by ArcMap for more detailed explanation.
3. Categorical and Continuous data
In environmental modeling, environmental variables can be classified as categorical or continuous:
Categorical Data
Represents discrete classes or categories.
Examples:
Land Cover Type: Forest, grassland, urban.
Soil Type: Sandy, clay, loam.
Vegetation Type: Different plant communities.
Continuous Data
Represents data with a smooth gradient, measurable values.
Examples:
Temperature: Annual mean temperature.
Precipitation: Monthly or annual rainfall.
Elevation: Height above sea level in meters.
Categorical data is useful for distinct classifications, while continuous data is used for variables that change gradually across the landscape. Both types are important for predicting species distributions.
Set-up: R Environment and Packages
Some housekeeping before we start. This process might take some time as many packages needed to be installed.
S.1 Set the working directory and create a folder for data.
Save the Quarto Markdown file (.QMD) to a folder of your choice, and then set the path to your folder as your working directory.
Ideally, you would use the renv
package to create an isolated environment for installing all the required R packages used in this notebook. However, since installing renv
and its dependencies can be time-consuming, we recommend trying this after the workshop.
# # Ensure "renv" package is installed
# if (!requireNamespace("renv", quietly = TRUE)) {
# install.packages("renv")
# }
#
# # Check if renv has been initialized in the project
# if (!file.exists("renv/activate.R")) {
# message("renv has not been initiated in this project. Initializing now...")
# renv::init() # Initialize renv if not already set up
# } else {
# source("renv/activate.R") # Activate the renv environment
# message("renv is activated.")
# }
#
# # Check for the existence of renv.lock and restore the environment
# if (file.exists("renv.lock")) {
# message("Restoring renv environment from renv.lock...")
# renv::restore()
# } else {
# message("No renv.lock file found in the current directory. Skipping restore.")
# }
S.2 Install and load essential libraries.
Install and load R packages. Terra and sf are essential for working with raster and vector data, respectively. Other packages like dplyr, ggplot2, and leaflet are useful for data manipulation, visualization, and interactive mapping.
# Set CRAN mirror
options(repos = c(CRAN = "https://cran.rstudio.com/"))
# List of packages to check, install if needed, and load
<- c("dplyr", "terra", "sf", "googledrive", "ggplot2", "leaflet", "shiny", "htmltools")
packages
# Function to display a cat message
<- function(pkg, message_type) {
cat_message if (message_type == "installed") {
cat(paste0(pkg, " has been installed successfully!\n"))
else if (message_type == "loading") {
} cat(paste0(pkg, " is already installed and has been loaded!\n"))
}
}
# Install missing packages and load them
for (pkg in packages) {
if (!requireNamespace(pkg, quietly = TRUE)) {
install.packages(pkg)
cat_message(pkg, "installed")
else {
} cat_message(pkg, "loading")
}library(pkg, character.only = TRUE)
}
dplyr is already installed and has been loaded!
terra is already installed and has been loaded!
sf is already installed and has been loaded!
googledrive is already installed and has been loaded!
ggplot2 is already installed and has been loaded!
leaflet is already installed and has been loaded!
shiny is already installed and has been loaded!
htmltools is already installed and has been loaded!
# De-authenticate Google Drive to access public files
drive_deauth()
# If you are using renv, you can snapshot the renv after loading all the packages.
#renv::snapshot()
S.3 Download the Case Study Datasets
Environmental variables are downloaded from both WorldClim and Google Drive, and stored in GeoTIFF format.
# Define Google Drive file ID and the path for downloading
<- "1TD4WBLo7uzNjtWS2DXvS1PVCUzetxfqn" # Replace with the actual file ID of the zipped file
zip_file_id
<- file.path(workspace)
datafolder_path
# Create a local path for the zipped file
<- file.path(datafolder_path, "raster_preparation.zip")
zip_file_path
# Function to download a file with progress messages
<- function(file_id, file_path) {
download_zip_file cat("Downloading zipped file...\n")
drive_download(as_id(file_id), path = file_path, overwrite = TRUE)
cat("Downloaded zipped file to:", file_path, "\n")
}
# Create local directory if it doesn't exist
if (!dir.exists(datafolder_path)) {
dir.create(datafolder_path, recursive = TRUE)
}
# Download the zipped file
cat("Starting to download the zipped file...\n")
Starting to download the zipped file...
download_zip_file(zip_file_id, zip_file_path)
Downloading zipped file...
File downloaded:
• 'rasters_files.zip' <id: 1TD4WBLo7uzNjtWS2DXvS1PVCUzetxfqn>
Saved locally as:
• '/Users/xiangzhaoqcif/Documents/QCIF_MacBook_Pro/github/notebook-blog/notebooks/data_prep/raster_preparation.zip'
Downloaded zipped file to: /Users/xiangzhaoqcif/Documents/QCIF_MacBook_Pro/github/notebook-blog/notebooks/data_prep/raster_preparation.zip
# Unzip the downloaded file
cat("Unzipping the file...\n")
Unzipping the file...
unzip(zip_file_path, exdir = datafolder_path)
cat("Unzipped files to folder:", datafolder_path, "\n")
Unzipped files to folder: /Users/xiangzhaoqcif/Documents/QCIF_MacBook_Pro/github/notebook-blog/notebooks/data_prep
1. Load and Check Raster Files
1.1 Study area
Before processing raster or any other spatial data, it’s important to understand your study area. This helps you determine the spatial extent of the environmental variables to collect, whether it’s nationwide data (e.g., all of Australia) or localised data for a smaller area.
# Load your shapefile
<- st_read("rasters_files/central_highlands.shp") central_highlands
Reading layer `central_highlands' from data source
`/Users/xiangzhaoqcif/Documents/QCIF_MacBook_Pro/github/notebook-blog/notebooks/data_prep/rasters_files/central_highlands.shp'
using driver `ESRI Shapefile'
Simple feature collection with 1 feature and 4 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 144.9398 ymin: -38.20964 xmax: 146.4563 ymax: -36.97746
Geodetic CRS: GDA94
#check the CRS of the vect of the study area
print(crs(central_highlands))
[1] "GEOGCRS[\"GDA94\",\n DATUM[\"Geocentric Datum of Australia 1994\",\n ELLIPSOID[\"GRS 1980\",6378137,298.257222101,\n LENGTHUNIT[\"metre\",1]]],\n PRIMEM[\"Greenwich\",0,\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n CS[ellipsoidal,2],\n AXIS[\"geodetic latitude (Lat)\",north,\n ORDER[1],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n AXIS[\"geodetic longitude (Lon)\",east,\n ORDER[2],\n ANGLEUNIT[\"degree\",0.0174532925199433]],\n USAGE[\n SCOPE[\"Horizontal component of 3D system.\"],\n AREA[\"Australia including Lord Howe Island, Macquarie Island, Ashmore and Cartier Islands, Christmas Island, Cocos (Keeling) Islands, Norfolk Island. All onshore and offshore.\"],\n BBOX[-60.55,93.41,-8.47,173.34]],\n ID[\"EPSG\",4283]]"
From above, we can find the CRS of the study area shapefile is EPSG:4283, the unit of it is ‘degree’. To know more about this CRS, visit https://epsg.io/4283.
# Custom CSS for a smaller legend box
<- tags$style(HTML("
custom_css .leaflet-control .legend {
font-size: 8px !important; /* Reduce font size */
padding: 4px; /* Reduce padding inside the legend box */
line-height: 1; /* Reduce spacing between lines */
width: auto; /* Automatically size the legend box */
height: auto; /* Automatically size the legend box */
}
.leaflet-control .legend i {
width: 10px; /* Smaller legend icons */
height: 10px;
}
"))
# Render the map
leaflet() %>%
addProviderTiles(providers$Esri.WorldImagery) %>%
# Add the Central Highlands layer with a distinct color
addPolygons(
data = central_highlands,
color = "lightblue", # Border color of Central Highlands polygon
weight = 1, # Border width
fillColor = "lightblue", # Fill color of Central Highlands
fillOpacity = 0.3, # Transparency for fill
group = "Central Highlands"
%>%
) setView(lng = 145.7, lat = -37.5, zoom = 7) %>% # Set the view to desired location
addLegend(
position = "bottomright",
colors = c("lightblue"),
labels = c("Central Highlands"),
opacity = 0.7
%>%
) # Add the custom CSS to modify the legend font size
::prependContent(custom_css) htmlwidgets
Warning: sf layer has inconsistent datum (+proj=longlat +ellps=GRS80 +no_defs).
Need '+proj=longlat +datum=WGS84'
This light blue area called Central Highlands in Victoria, Australia is our study area.
1.2 Load Rasters
We prepared 15 environmental raster files for this practice.
Variable name | Variables | Data Type | Resolution | Source |
---|---|---|---|---|
1 - AusClim_bioclim_01_9s_1976-2005_vic | Bioclim 01: Annual mean temperature | Continuous | 9 seconds (9s) | 1976-2005, CSIRO via EcoCommons |
2 - AusClim_bioclim_04_9s_1976-2005_vic | Bioclim 04: Temperature Seasonality (standard deviation *100) | Continuous | 9 seconds (9s) | 1976-2005, CSIRO via EcoCommons |
3 - AusClim_bioclim_06_9s_1976-2005_vic | Bioclim 06: Min Temperature of Coldest Month | Continuous | 9 seconds (9s) | 1976-2005, CSIRO via EcoCommons |
4 - AusClim_bioclim_10_9s_1976-2005_vic | Bioclim 10: Mean Temperature of Warmest Quarter | Continuous | 9 seconds (9s) | 1976-2005, CSIRO via EcoCommons |
5 - AusClim_bioclim_12_9s_1976-2005_vic | Bioclim 12: Annual Precipitation | Continuous | 9 seconds (9s) | 1976-2005, CSIRO via EcoCommons |
6 - AusClim_bioclim_18_9s_1976-2005_vic | Bioclim 18: Precipitation of Warmest Quarter | Continuous | 9 seconds (9s) | 1976-2005, CSIRO via EcoCommons |
7 - AusClim_bioclim_19_9s_1976-2005_vic | Bioclim 19: Precipitation of Coldest Quarter | Continuous | 9 seconds (9s) | 1976-2005, CSIRO via EcoCommons |
8 - AusClim_bioclim_20_9s_1976-2005_vic | Bioclim 20: Annual mean radiation | Continuous | 9 seconds (9s) | 1976-2005, CSIRO via EcoCommons |
9 - AusClim_bioclim_28_9s_1976-2005_vic | Bioclim 28: Annual mean moisture index | Continuous | 9 seconds (9s) | 1976-2005, CSIRO via EcoCommons |
10 - AusClim_bioclim_34_9s_1976-2005_vic | Bioclim 34: Mean moisture index of warmest quarter | Continuous | 9 seconds (9s) | 1976-2005, CSIRO via EcoCommons |
11 - AusClim_bioclim_35_9s_1976-2005_vic | Bioclim 35: Mean moisture index of coldest quarter | Continuous | 9 seconds (9s) | 1976-2005, CSIRO via EcoCommons |
12 - digital_elevation_model_3s_2009_3s_vic | Digital Elevation Model | Continuous | 3 seconds (3s) | Geoscience Australia via EcoCommons |
13 - Fire_events_projection_area_vic | Fires | Continuous | 1 km | Vic DEECA spatial data and resources |
14 - Logging_1km_vic | Logging | Continuous | 1 km | Vic DEECA spatial data and resources |
15 - nsg-2011-250m_asc_vic | Australian Soil Classification | Categorical | 250 m | Tern via EcoCommons |
Usually, you are the person who collects raster files from different data platform or other researchers. Apart from the content of the raster, you should know the data type (categorical or continuous), resolution (units in degree or meter/kilometers), and Coordinate Reference System (CRS, which also determines the unit of the raster).
# Define the folder path
<- "rasters_files"
datafolder_path
# List all .tif files with full paths
<- list.files(path = datafolder_path, pattern = "\\.tif$", full.names = TRUE)
file_paths
# Load all rasters and store in a list
<- lapply(file_paths, rast)
rasters
# Assign names to the raster list using file names without extensions
names(rasters) <- tools::file_path_sans_ext(basename(file_paths))
# Print numbers before each raster name
for (i in seq_along(names(rasters))) {
cat(i, "-", names(rasters)[i], "\n")
}
1 - AusClim_bioclim_01_9s_1976-2005_vic
2 - AusClim_bioclim_04_9s_1976-2005_vic
3 - AusClim_bioclim_06_9s_1976-2005_vic
4 - AusClim_bioclim_10_9s_1976-2005_vic
5 - AusClim_bioclim_12_9s_1976-2005_vic
6 - AusClim_bioclim_18_9s_1976-2005_vic
7 - AusClim_bioclim_19_9s_1976-2005_vic
8 - AusClim_bioclim_20_9s_1976-2005_vic
9 - AusClim_bioclim_28_9s_1976-2005_vic
10 - AusClim_bioclim_34_9s_1976-2005_vic
11 - AusClim_bioclim_35_9s_1976-2005_vic
12 - digital_elevation_model_3s_2009_3s_vic
13 - Fire_events_projection_area_vic
14 - Logging_1km_vic
15 - nsg-2011-250m_asc_vic
Now, let’s check the data type of each raster.
# Check if each raster is categorical or continuous
for (i in seq_along(rasters)) {
if (is.factor(rasters[[i]])) {
cat(names(rasters[[i]]), "is categorical.\n")
else {
} cat(names(rasters[[i]]), "is continuous.\n")
} }
AusClim_bioclim_01_9s_1976-2005 is continuous.
AusClim_bioclim_04_9s_1976-2005 is continuous.
AusClim_bioclim_06_9s_1976-2005 is continuous.
AusClim_bioclim_10_9s_1976-2005 is continuous.
AusClim_bioclim_12_9s_1976-2005 is continuous.
AusClim_bioclim_18_9s_1976-2005 is continuous.
AusClim_bioclim_19_9s_1976-2005 is continuous.
AusClim_bioclim_20_9s_1976-2005 is continuous.
AusClim_bioclim_28_9s_1976-2005 is continuous.
AusClim_bioclim_34_9s_1976-2005 is continuous.
AusClim_bioclim_35_9s_1976-2005 is continuous.
digital_elevation_model_3s_2009_3s is continuous.
Fire_events_projection_area is continuous.
Logging_1km is continuous.
nsg-2011-250m_asc is continuous.
Australian Soil Classification (nsg-2011-250m_asc_vic) should be categorical, however, it is shown as continuous. Why is that? Here are some possible reasons:
Incorrect Data Type Interpretation:
- The software or library you are using may be interpreting the soil classification values as continuous numbers instead of categories (factors or levels).
Raster Encoding:
- Categorical data in raster format is typically stored with integer values representing categories. If the raster metadata does not correctly define these as categories, the software may interpret them as continuous.
Missing Attribute Table:
- Categorical rasters often have an attribute table linking integer values to class names. If this table is missing or not recognized, the values may appear as continuous.
Data Export or Conversion Issue:
- If the raster was exported or converted incorrectly (e.g., saved as a floating-point raster), it could cause the values to be treated as continuous.
Sometimes, after manipulation, a raster may lose its categorical (factor) properties due to how terra (the R package) or other spatial libraries interpret the file upon reloading. Rasters saved as categorical may be stored with integer values rather than explicit factor levels in the file, meaning they need to be redefined as factors when reloaded.
# we can factorize Australian Soil Classification (nsg-2011-250m_asc_vic) to make sure it is categorical.
15]] <- as.factor(rasters[[15]])
rasters[[
# Check if each raster is categorical or continuous
for (i in seq_along(rasters)) {
if (is.factor(rasters[[i]])) {
cat(names(rasters[[i]]), "is categorical.\n")
else {
} cat(names(rasters[[i]]), "is continuous.\n")
} }
AusClim_bioclim_01_9s_1976-2005 is continuous.
AusClim_bioclim_04_9s_1976-2005 is continuous.
AusClim_bioclim_06_9s_1976-2005 is continuous.
AusClim_bioclim_10_9s_1976-2005 is continuous.
AusClim_bioclim_12_9s_1976-2005 is continuous.
AusClim_bioclim_18_9s_1976-2005 is continuous.
AusClim_bioclim_19_9s_1976-2005 is continuous.
AusClim_bioclim_20_9s_1976-2005 is continuous.
AusClim_bioclim_28_9s_1976-2005 is continuous.
AusClim_bioclim_34_9s_1976-2005 is continuous.
AusClim_bioclim_35_9s_1976-2005 is continuous.
digital_elevation_model_3s_2009_3s is continuous.
Fire_events_projection_area is continuous.
Logging_1km is continuous.
nsg-2011-250m_asc is categorical.
Now, it looks correct.
1.3 Plot Rasters to Understand Your Data
<- c(
layer_titles "Bioclim 01: Annual Mean Temperature",
"Bioclim 04: Temperature Seasonality (standard deviation *100)",
"Bioclim 06: Min Temperature of Coldest Month",
"Bioclim 10: Mean Temperature of Warmest Quarter",
"Bioclim 20: Annual mean radiation",
"Bioclim 12: Annual Precipitation",
"Bioclim 18: Precipitation of Warmest Quarter",
"Bioclim 19: Precipitation of Coldest Quarter",
"Bioclim 28: Annual Mean Moisture Index",
"Bioclim 34: Mean Moisture Index of Warmest Quarter",
"Bioclim 35: Mean Moisture Index of Coldest Quarter",
"Digital Elevation Model",
"Logging",
"Fires",
"Australian Soil Classification"
)
for (i in seq_along(rasters)) {
plot(rasters[[i]], main = layer_titles[i], col = terrain.colors(20)) # Plot raster
plot(central_highlands, add = TRUE, border = "red", weight = 1,
col = NA, lwd = 1.5) # Overlay shapefile
}
It is great that all raster layers are sufficiently larger (but not too large) than our study area. This means we won’t have any data gaps or overly large raster layers to manage.
1.4 Check CRS, Extent, and Resolution of Each Raster
# Iterate over the rasters to show resolution, extent, and CRS
for (i in seq_along(rasters)) {
cat("\nRaster", i, ":\n")
# Show the resolution
<- res(rasters[[i]])
res_val cat("Resolution (x, y):", res_val[1], ",", res_val[2], "\n")
# Show the extent - Modified to handle S4 object
<- ext(rasters[[i]])
ext_val cat("Extent:", as.character(ext_val), "\n") # Convert ext_val to character
# Show the CRS
<- crs(rasters[[i]])
crs_val
# Extract all occurrences of ID["EPSG",...] using gregexpr
<- regmatches(crs_val, gregexpr("ID\\[\"EPSG\",[0-9]+\\]", crs_val))
id_matches
# Select the last match
<- tail(unlist(id_matches), 1)
last_id
# Print the result
cat("CRS:", last_id, "\n")
}
Raster 1 :
Resolution (x, y): 0.0025 , 0.0025
Extent: ext(140.9625, 149.975, -39.16, -33.98)
CRS: ID["EPSG",4283]
Raster 2 :
Resolution (x, y): 0.0025 , 0.0025
Extent: ext(140.9625, 149.975, -39.16, -33.98)
CRS: ID["EPSG",4283]
Raster 3 :
Resolution (x, y): 0.0025 , 0.0025
Extent: ext(140.9625, 149.975, -39.16, -33.98)
CRS: ID["EPSG",4283]
Raster 4 :
Resolution (x, y): 0.0025 , 0.0025
Extent: ext(140.9625, 149.975, -39.16, -33.98)
CRS: ID["EPSG",4283]
Raster 5 :
Resolution (x, y): 0.0025 , 0.0025
Extent: ext(140.9625, 149.975, -39.16, -33.98)
CRS: ID["EPSG",4283]
Raster 6 :
Resolution (x, y): 0.0025 , 0.0025
Extent: ext(140.9625, 149.975, -39.16, -33.98)
CRS: ID["EPSG",4283]
Raster 7 :
Resolution (x, y): 0.0025 , 0.0025
Extent: ext(140.9625, 149.975, -39.16, -33.98)
CRS: ID["EPSG",4283]
Raster 8 :
Resolution (x, y): 0.0025 , 0.0025
Extent: ext(140.9625, 149.975, -39.16, -33.98)
CRS: ID["EPSG",4283]
Raster 9 :
Resolution (x, y): 0.0025 , 0.0025
Extent: ext(140.9625, 149.975, -39.16, -33.98)
CRS: ID["EPSG",4283]
Raster 10 :
Resolution (x, y): 0.0025 , 0.0025
Extent: ext(140.9625, 149.975, -39.16, -33.98)
CRS: ID["EPSG",4283]
Raster 11 :
Resolution (x, y): 0.0025 , 0.0025
Extent: ext(140.9625, 149.975, -39.16, -33.98)
CRS: ID["EPSG",4283]
Raster 12 :
Resolution (x, y): 0.0008333333 , 0.0008333333
Extent: ext(140.962083333851, 149.976250000663, -39.1595833336029, -33.9804166668527)
CRS: ID["EPSG",4283]
Raster 13 :
Resolution (x, y): 0.00899273 , 0.008983703
Extent: ext(142.799993603565, 149.976192429999, -39.1591895299991, -36.500013311383)
CRS: ID["EPSG",4283]
Raster 14 :
Resolution (x, y): 0.009 , 0.009
Extent: ext(142.79785887, 149.97985887, -39.1591895299999, -36.4951895299999)
CRS: ID["EPSG",4283]
Raster 15 :
Resolution (x, y): 0.0025 , 0.0025
Extent: ext(140.9625, 149.975, -39.16, -33.98)
CRS: ID["EPSG",4283]
CRS: All the raster layers have the same CRS, which is EPSG: 4283.
Extent: Raster 1-11 and 15 have the same extent, the rest raster layers have different extents. All the extents of these raster files are sufficiently larger then the extent of the study area. So we don’t have to worry too much on this.
Resolution: Raster 1-11 and 15 have the same resolution, which is 0.0025 degree (9 seconds), the rest raster layers have different resolutions.
We need to unify extent and resolutions.
1.5 Unify CRS and Resolution
As we know, the CRS of the shapefile of the study area and all the raster files is EPSG:4283 (unit in degree). What if I want to change the CRS of these files to EPSG: 28355 (GDA94 / MAG Zone 55, unit in meter), a more precise CRS for our study area (why? read this article by Paul Wintour).
# Function to check CRS and reproject to EPSG: 28355
<- function(raster, is_categorical = FALSE) {
reproject_if_needed <- "EPSG:28355" # Define EPSG: 28355
EPSG_28255
# Check if the CRS is already EPSG:28355
if (!identical(crs(raster), EPSG_28255)) {
# Choose the appropriate method based on raster type
<- if (is_categorical) "near" else "bilinear" # see note below this chunk
method message("Reprojecting raster to EPSG:28355 using method: ", method)
# Reproject raster to EPSG:28355
<- project(raster, EPSG_28255, method = method)
raster
# Convert to factor again if categorical to ensure category levels are preserved
if (is_categorical) {
<- as.factor(raster)
raster
}
}
return(raster)
}
Different interpolation methods are appropriate for different types of data:
“near” (nearest neighbor interpolation): Used for categorical data (e.g., land cover types or classifications) because it preserves discrete class values.
“bilinear” (bilinear interpolation): Used for continuous data (e.g., temperature, elevation) because it smooths values by averaging nearby points.
# It is not a bad idea to check again whether your raster files are in the right data type.
# Check if each raster is categorical or continuous
for (i in seq_along(rasters)) {
if (is.factor(rasters[[i]])) {
cat(names(rasters[[i]]), "is categorical.\n")
else {
} cat(names(rasters[[i]]), "is continuous.\n")
} }
AusClim_bioclim_01_9s_1976-2005 is continuous.
AusClim_bioclim_04_9s_1976-2005 is continuous.
AusClim_bioclim_06_9s_1976-2005 is continuous.
AusClim_bioclim_10_9s_1976-2005 is continuous.
AusClim_bioclim_12_9s_1976-2005 is continuous.
AusClim_bioclim_18_9s_1976-2005 is continuous.
AusClim_bioclim_19_9s_1976-2005 is continuous.
AusClim_bioclim_20_9s_1976-2005 is continuous.
AusClim_bioclim_28_9s_1976-2005 is continuous.
AusClim_bioclim_34_9s_1976-2005 is continuous.
AusClim_bioclim_35_9s_1976-2005 is continuous.
digital_elevation_model_3s_2009_3s is continuous.
Fire_events_projection_area is continuous.
Logging_1km is continuous.
nsg-2011-250m_asc is categorical.
# if not, factorize the categirical raster file manually
# rasters[[15]] <- as.factor(rasters[[15]])
# Define the categorical raster indices
<- c(15) # Adjust if there are other categorical rasters
categorical_indices
# Apply the function to each raster in the list
<- lapply(seq_along(rasters), function(i) {
rasters_reprojected <- i %in% categorical_indices
is_categorical reproject_if_needed(rasters[[i]], is_categorical = is_categorical)
})
Reprojecting raster to EPSG:28355 using method: bilinear
Reprojecting raster to EPSG:28355 using method: bilinear
Reprojecting raster to EPSG:28355 using method: bilinear
Reprojecting raster to EPSG:28355 using method: bilinear
Reprojecting raster to EPSG:28355 using method: bilinear
Reprojecting raster to EPSG:28355 using method: bilinear
Reprojecting raster to EPSG:28355 using method: bilinear
Reprojecting raster to EPSG:28355 using method: bilinear
Reprojecting raster to EPSG:28355 using method: bilinear
Reprojecting raster to EPSG:28355 using method: bilinear
Reprojecting raster to EPSG:28355 using method: bilinear
Reprojecting raster to EPSG:28355 using method: bilinear
|---------|---------|---------|---------|
=========================================
Reprojecting raster to EPSG:28355 using method: bilinear
Reprojecting raster to EPSG:28355 using method: bilinear
Reprojecting raster to EPSG:28355 using method: near
Let’s check the properties of all raster files.
# Iterate over the reprojected rasters to show resolution, extent, and CRS
for (i in seq_along(rasters_reprojected)) {
cat("\nRaster", i, ":\n")
# Show the resolution
<- res(rasters_reprojected[[i]])
res_val cat("Resolution (x, y):", res_val[1], ",", res_val[2], "\n")
# Show the extent - Modified to handle S4 object
<- ext(rasters_reprojected[[i]])
ext_val cat("Extent:", as.character(ext_val), "\n") # Convert ext_val to character
# Show the CRS
<- crs(rasters_reprojected[[i]])
crs_val
# Extract all occurrences of ID["EPSG",...] using gregexpr
<- regmatches(crs_val, gregexpr("ID\\[\"EPSG\",[0-9]+\\]", crs_val))
id_matches
# Select the last match
<- tail(unlist(id_matches), 1)
last_id
# Print the result
cat("CRS:", last_id, "\n")
}
Raster 1 :
Resolution (x, y): 238.1606 , 238.1606
Extent: ext(-58070.4168458765, 774777.333003385, 5647993.85079617, 6240061.19598497)
CRS: ID["EPSG",28355]
Raster 2 :
Resolution (x, y): 238.1606 , 238.1606
Extent: ext(-58070.4168458765, 774777.333003385, 5647993.85079617, 6240061.19598497)
CRS: ID["EPSG",28355]
Raster 3 :
Resolution (x, y): 238.1606 , 238.1606
Extent: ext(-58070.4168458765, 774777.333003385, 5647993.85079617, 6240061.19598497)
CRS: ID["EPSG",28355]
Raster 4 :
Resolution (x, y): 238.1606 , 238.1606
Extent: ext(-58070.4168458765, 774777.333003385, 5647993.85079617, 6240061.19598497)
CRS: ID["EPSG",28355]
Raster 5 :
Resolution (x, y): 238.1606 , 238.1606
Extent: ext(-58070.4168458765, 774777.333003385, 5647993.85079617, 6240061.19598497)
CRS: ID["EPSG",28355]
Raster 6 :
Resolution (x, y): 238.1606 , 238.1606
Extent: ext(-58070.4168458765, 774777.333003385, 5647993.85079617, 6240061.19598497)
CRS: ID["EPSG",28355]
Raster 7 :
Resolution (x, y): 238.1606 , 238.1606
Extent: ext(-58070.4168458765, 774777.333003385, 5647993.85079617, 6240061.19598497)
CRS: ID["EPSG",28355]
Raster 8 :
Resolution (x, y): 238.1606 , 238.1606
Extent: ext(-58070.4168458765, 774777.333003385, 5647993.85079617, 6240061.19598497)
CRS: ID["EPSG",28355]
Raster 9 :
Resolution (x, y): 238.1606 , 238.1606
Extent: ext(-58070.4168458765, 774777.333003385, 5647993.85079617, 6240061.19598497)
CRS: ID["EPSG",28355]
Raster 10 :
Resolution (x, y): 238.1606 , 238.1606
Extent: ext(-58070.4168458765, 774777.333003385, 5647993.85079617, 6240061.19598497)
CRS: ID["EPSG",28355]
Raster 11 :
Resolution (x, y): 238.1606 , 238.1606
Extent: ext(-58070.4168458765, 774777.333003385, 5647993.85079617, 6240061.19598497)
CRS: ID["EPSG",28355]
Raster 12 :
Resolution (x, y): 79.38445 , 79.38445
Extent: ext(-58106.2476755253, 774954.201024326, 5648124.89827636, 6240015.37907549)
CRS: ID["EPSG",28355]
Raster 13 :
Resolution (x, y): 818.9801 , 818.9801
Extent: ext(123768.122970817, 766667.490018216, 5656743.63433381, 5960585.24602118)
CRS: ID["EPSG",28355]
Raster 14 :
Resolution (x, y): 819.8114 , 819.8114
Extent: ext(123553.393091346, 767105.315260186, 5656970.16891723, 5961120.18563652)
CRS: ID["EPSG",28355]
Raster 15 :
Resolution (x, y): 238.1606 , 238.1606
Extent: ext(-58070.4168458717, 774777.333003388, 5647993.85079617, 6240061.19598497)
CRS: ID["EPSG",28355]
As we can see, we unified the CRS of all raster files to EPSG:28355. We can also notice that the figures in resolution and extent have changed. That is because the unit of the CRS has changed from degree to meter.
1.6 Resample Rasters to Match the Finest Resolution
There are two methods for unifying the resolutions of raster files: (1) resampling to the finest resolution or (2) resampling to the coarsest resolution. Resampling to the finest resolution preserves as much information as possible from the original raster files but increases the file size and processing time. On the other hand, resampling to the coarsest resolution significantly reduces the file size and processing time, though it may result in some loss of detail.
In EcoCommons, we recommend using the finest resolution when the objective of the study is well known, and coarser resolutions for preliminary / pilot study. The decision of fine vs coarse also depends on your study area, species and data availability. For example, when studying large, mobile animals such as whales or elephants, resampling all your raster files—from, say, 1 km, 5 km, 250 m, and 5 m to 5 m—would result in an unnecessary computational burden and waste of resources.
# Determine the finest resolution among the reprojected rasters
<- sapply(rasters_reprojected, function(r) res(r)[1] * res(r)[2])
resolutions <- which.min(resolutions)
finest_index <- rasters_reprojected[[finest_index]]
finest_raster
# Resample each raster to match the finest resolution
<- lapply(seq_along(rasters_reprojected), function(i) {
rasters_resampled <- rasters_reprojected[[i]]
raster
# Determine resampling method based on the type of raster (categorical vs continuous)
if (i == 15) { # the 15th raster is the categorical one
message("Resampling categorical raster to match the finest resolution using 'nearest' method.")
<- resample(raster, finest_raster, method = "near") # Use 'near' for categorical data
raster else {
} message("Resampling continuous raster to match the finest resolution using 'bilinear' method.")
<- resample(raster, finest_raster, method = "bilinear") # Use 'bilinear' for continuous data
raster
}
return(raster)
})
Resampling continuous raster to match the finest resolution using 'bilinear' method.
|---------|---------|---------|---------|
=========================================
Resampling continuous raster to match the finest resolution using 'bilinear' method.
|---------|---------|---------|---------|
=========================================
Resampling continuous raster to match the finest resolution using 'bilinear' method.
|---------|---------|---------|---------|
=========================================
Resampling continuous raster to match the finest resolution using 'bilinear' method.
|---------|---------|---------|---------|
=========================================
Resampling continuous raster to match the finest resolution using 'bilinear' method.
|---------|---------|---------|---------|
=========================================
Resampling continuous raster to match the finest resolution using 'bilinear' method.
|---------|---------|---------|---------|
=========================================
Resampling continuous raster to match the finest resolution using 'bilinear' method.
|---------|---------|---------|---------|
=========================================
Resampling continuous raster to match the finest resolution using 'bilinear' method.
|---------|---------|---------|---------|
=========================================
Resampling continuous raster to match the finest resolution using 'bilinear' method.
|---------|---------|---------|---------|
=========================================
Resampling continuous raster to match the finest resolution using 'bilinear' method.
|---------|---------|---------|---------|
=========================================
Resampling continuous raster to match the finest resolution using 'bilinear' method.
|---------|---------|---------|---------|
=========================================
Resampling continuous raster to match the finest resolution using 'bilinear' method.
|---------|---------|---------|---------|
=========================================
Resampling continuous raster to match the finest resolution using 'bilinear' method.
|---------|---------|---------|---------|
=========================================
Resampling continuous raster to match the finest resolution using 'bilinear' method.
|---------|---------|---------|---------|
=========================================
Resampling categorical raster to match the finest resolution using 'nearest' method.
|---------|---------|---------|---------|
=========================================
Warning: [resample] detected values outside of the limits of datatype INT4S
# Iterate over the reprojected rasters to show resolution, extent, and CRS
for (i in seq_along(rasters_resampled)) {
cat("\nRaster", i, ":\n")
# Show the resolution
<- res(rasters_resampled[[i]])
res_val cat("Resolution (x, y):", res_val[1], ",", res_val[2], "\n")
# Show the extent - Modified to handle S4 object
<- ext(rasters_resampled[[i]])
ext_val cat("Extent:", as.character(ext_val), "\n") # Convert ext_val to character
# Show the CRS
<- crs(rasters_resampled[[i]])
crs_val
# Extract all occurrences of ID["EPSG",...] using gregexpr
<- regmatches(crs_val, gregexpr("ID\\[\"EPSG\",[0-9]+\\]", crs_val))
id_matches
# Select the last match
<- tail(unlist(id_matches), 1)
last_id
# Print the result
cat("CRS:", last_id, "\n")
}
Raster 1 :
Resolution (x, y): 79.38445 , 79.38445
Extent: ext(-58106.2476755253, 774954.201024326, 5648124.89827636, 6240015.37907549)
CRS: ID["EPSG",28355]
Raster 2 :
Resolution (x, y): 79.38445 , 79.38445
Extent: ext(-58106.2476755253, 774954.201024326, 5648124.89827636, 6240015.37907549)
CRS: ID["EPSG",28355]
Raster 3 :
Resolution (x, y): 79.38445 , 79.38445
Extent: ext(-58106.2476755253, 774954.201024326, 5648124.89827636, 6240015.37907549)
CRS: ID["EPSG",28355]
Raster 4 :
Resolution (x, y): 79.38445 , 79.38445
Extent: ext(-58106.2476755253, 774954.201024326, 5648124.89827636, 6240015.37907549)
CRS: ID["EPSG",28355]
Raster 5 :
Resolution (x, y): 79.38445 , 79.38445
Extent: ext(-58106.2476755253, 774954.201024326, 5648124.89827636, 6240015.37907549)
CRS: ID["EPSG",28355]
Raster 6 :
Resolution (x, y): 79.38445 , 79.38445
Extent: ext(-58106.2476755253, 774954.201024326, 5648124.89827636, 6240015.37907549)
CRS: ID["EPSG",28355]
Raster 7 :
Resolution (x, y): 79.38445 , 79.38445
Extent: ext(-58106.2476755253, 774954.201024326, 5648124.89827636, 6240015.37907549)
CRS: ID["EPSG",28355]
Raster 8 :
Resolution (x, y): 79.38445 , 79.38445
Extent: ext(-58106.2476755253, 774954.201024326, 5648124.89827636, 6240015.37907549)
CRS: ID["EPSG",28355]
Raster 9 :
Resolution (x, y): 79.38445 , 79.38445
Extent: ext(-58106.2476755253, 774954.201024326, 5648124.89827636, 6240015.37907549)
CRS: ID["EPSG",28355]
Raster 10 :
Resolution (x, y): 79.38445 , 79.38445
Extent: ext(-58106.2476755253, 774954.201024326, 5648124.89827636, 6240015.37907549)
CRS: ID["EPSG",28355]
Raster 11 :
Resolution (x, y): 79.38445 , 79.38445
Extent: ext(-58106.2476755253, 774954.201024326, 5648124.89827636, 6240015.37907549)
CRS: ID["EPSG",28355]
Raster 12 :
Resolution (x, y): 79.38445 , 79.38445
Extent: ext(-58106.2476755253, 774954.201024326, 5648124.89827636, 6240015.37907549)
CRS: ID["EPSG",28355]
Raster 13 :
Resolution (x, y): 79.38445 , 79.38445
Extent: ext(-58106.2476755253, 774954.201024326, 5648124.89827636, 6240015.37907549)
CRS: ID["EPSG",28355]
Raster 14 :
Resolution (x, y): 79.38445 , 79.38445
Extent: ext(-58106.2476755253, 774954.201024326, 5648124.89827636, 6240015.37907549)
CRS: ID["EPSG",28355]
Raster 15 :
Resolution (x, y): 79.38445 , 79.38445
Extent: ext(-58106.2476755253, 774954.201024326, 5648124.89827636, 6240015.37907549)
CRS: ID["EPSG",28355]
Now, the resolution has been unified to 79.4 meters.
2. Crop and Mask Raster to the Extent of your Study Area
Now, we want to crop all raster files to the boundary of our study area. There are two steps: crop and mask.
Crop can be used to reduce the extent of the original raster to match the extent of the shapefile (here is the central highland study area). This helps with reducing the size of the raster before applying the mask, which speeds up processing and reduces file size.
Then, mask can be used to trim raster files to the exact shape of the study area.
Aspect | Crop | Mask |
---|---|---|
Output Shape | Always rectangular (bounding box) | Irregular (follows the shape of the mask) |
Outside Values | Values outside the extent are removed | Values outside the shape are set to NA |
Retained Area | Everything within the specified extent | Only cells within the masking polygon |
Extent | Changed to match the cropping extent | Remains the same as the original raster |
Use Case | When you need a rectangular subset | When you need to retain an irregular area |
# Transform the CRS of central_highlands to EPSG:28355
<- st_transform(central_highlands, crs = 28355)
central_highlands_28355
# Crop and mask each resampled raster to the extent of the cenral highlands vector
<- lapply(rasters_resampled, function(raster) {
cropped_masked_rasters # Crop the raster to the extent of Australia
<- crop(raster, central_highlands_28355)
cropped_raster # Mask the raster with the Australia boundary to keep only data within Australia
<- mask(cropped_raster, central_highlands_28355)
masked_raster return(masked_raster)
})
3. Stack and Save Rasters
We can only stack rasters that have the same extent, CRS, and resolution. If stacking fails, it means that at least one of these properties do not match.
# Stack the cropped and masked rasters together
<- do.call(c, cropped_masked_rasters)
raster_stack
# Print summary to verify the stack
print(raster_stack)
class : SpatRaster
size : 1717, 1689, 15 (nrow, ncol, nlyr)
resolution : 79.38445, 79.38445 (x, y)
extent : 318096.7, 452177, 5770536, 5906839 (xmin, xmax, ymin, ymax)
coord. ref. : GDA94 / MGA zone 55 (EPSG:28355)
source(s) : memory
names : AusCl~-2005, AusCl~-2005, AusCl~-2005, AusCl~-2005, AusCl~-2005, AusCl~-2005, ...
min values : 6.019307, 1.245917, -1.754711, 11.52329, 569.4792, 106.5151, ...
max values : 14.811925, 1.674897, 5.044275, 20.74784, 1827.3513, 296.9134, ...
# Plot each layer to visually verify the raster stack
plot(raster_stack)
# Save the raster stack to a GeoTIFF file
writeRaster(raster_stack, filename = "stacked_central_higlands_raster.tif", overwrite = TRUE)
Congratulations, now you have learnt how to:
- Reproject CRS of raster files and shapefiles.
- Check and unify resolution of rasters by resampling.
- Also, you know the resampling methods for both categorical and continuous rasters.
- Crop and mask rasters
- Stack rasters and save stacked rasters.