This study is a reproduction of unpublished research by Lyimo et al (2023), titled: Makuyuni Wildlife Corridor: Analysis of the Effects of Socioecological Interactions and Changing Land Use on Movement Patterns of Large Mammal Species. The original study evaluated the connectivity of Makuyuni Wildlife Corridor, a stretch of unprotected land between Tarangire National Park and Essmingor National Forest Reserve in Tanzania. We reproduce the first part of the study, a multifactor analysis model that approximates a movement cost surface for the corridor. The study was originally conducted using the QGIS Model Builder, and we reproduce the workflow in R. The goal of the study is to reproduce the authors’ map displaying the minimum cost of movement through each point in the study area, compare the reproduced map to the original map, and evaluate why the reproduction differs from the original. Our reproduction is relatively successful, with some minor discrepancies attributed to differences in computational environment.
Key words
: multifactor analysis, least cost pathway,
wildlife corridor, movement cost, Makayuni, Tarangire, Essmingor,
TanzaniaSubject
: Social and Behavioral Sciences: Geography:
Nature and Society RelationsDate created
: 11-30-2023Date modified
: 12-18-2023Spatial Coverage
: Makayuni Wildlife Corridor,
TanzaniaSpatial Resolution
: 10mSpatial Reference System
: EPSG 32736Temporal Coverage
: 2023This study is a reproduction of an original observational study that created a cost surface of wildlife movement in Makayuni Wildlife Corridor. We aim to reproduce the original study’s cost surface map in the R computational environment rather than the QGIS Model Builder environment used by the original authors. The null hypothesis is that there are no differences between the reproduced cost surface map and the original cost surface map. The alternative hypothesis is that there are differences between the reproduced cost surface map and the original cost surface map. We test this hypothesis by subtracting the reproduced map from the original to identify areas of difference.
The original study was conducted using QGIS and Saga. The reproduction study uses R, including the sf package for vector analysis, the stars and raster packages for raster analysis, and the tmap package for cartography.
The reproduction was conducted on a 2019 MacBook Pro, running under macOS Sonoma 14.1.2. We use R version 4.3.2 and the following versions of each package:
tidyverse_2.0.0
here_1.0.1
sf_1.0-14
stars_0.6-4
raster_3.6-26
groundhog_3.1.2
conflicted_1.2.0
We use the groundhog package to maintain a consistent computational environment for the sake of reproducibility.
Secondary data sources for the study are to include land cover, buffer region, merged buildings, initial clip, bounding box, secondary roads, bomas, and major roads.
Each of the next subsections describes one data source.
Note: This file size of this raster is too large to be uploaded to GitHub. It must be downloaded from the original authors, who can share the file via a Google Drive folder.
Abstract
: Raster of classified landcover for the entire
country of Tanzania.Label
: landcover.tifSpatial Coverage
: TanzaniaSpatial Resolution
: 4.78mSpatial Reference System
: EPSG 3857Temporal Coverage
: 2023Lineage
: Song et al
2023Distribution
: Shared by original authorsBands
: 1
Label
: LandcoverDefinition
: Classified landcover type using satellite
imageryType
: integerDomain
: 0-8Values
:
Abstract
: Shapefile of buffer region used to avoid edge
effectsLabel
: buffer_region.shpSpatial Coverage
: 833295.2141958434367552,
9585634.2357019279152155 : 860913.8547548535279930,
9617039.8727052193135023Spatial Reference System
: EPSG 32736Lineage
: Sourced from the shared data from the original
authors, presumed to be namely Justin Lucas.Distribution
: Shared by original authorsAbstract
: Shapefile containing polygon features of
Microsoft and OSM buildings combinedLabel
: merged_buildings.shpSpatial Coverage
: 4007553.8440031828358769,
-414102.9735587749746628 : 4033053.2097658971324563,
-387288.0955167215433903Spatial Reference System
: EPSG 3857Lineage
: Sourced from the shared data from the original
authors, presumed to be namely Justin Lucas.Distribution
: Shared by original authorsAbstract
: Shapefile used to initially clip landcover
map, larger than study area and bounding boxLabel
: initial clip.shpSpatial Coverage
: 830767.7591739012859762,
9581072.9865449033677578 : 862007.9419027733383700,
9619914.2523989714682102Spatial Reference System
: EPSG 32736Lineage
: Sourced from the shared data from the original
authors, presumed to be namely Justin Lucas.Distribution
: Shared by original authorsAbstract
: Shapefile which was used to create a buffer
to prevent edge effectsLabel
: bounding_box.shpSpatial Coverage
: Land surrounding Makuyuni Wildlife
Corridor and the corridor itselfSpatial Extent
: 833295.2141958434367552,
9585634.2357019279152155 : 860913.8547548535279930,
9617039.8727052193135023Spatial Reference System
: EPSG:32736 - WGS 84 / UTM
zone 36STemporal Coverage
: Unknown, presumed 2023Lineage
: Sourced from the shared data from the original
authors, presumed to be namely Justin Lucas.Distribution
: Shared by original authorsAbstract
: Shapefile containing polygon features of
bomas (rural settlements) within the wildlife corridorLabel
: bomas.shpSpatial Coverage
: Within the wildlife corridorSpatial Extent
: 4007605.1797135984525084,
-412580.2386067652842030 : 4032954.7106412472203374,
-387445.8448925596894696Spatial Reference System
: EPSG:3857 - WGS 84 /
Pseudo-MercatorTemporal Coverage
: Unknown, presumed 2023Lineage
: Sourced from the shared data from the original
authors, presumed to be namely Justin Lucas.Distribution
: Shared by original authorsAbstract
: Shapefile containing line features of two
highways that run through the study areaLabel
: major_roads_vector.shpSpatial Coverage
: The entirety of the extent of the
highways, which extend far past the study site and outside of Tanzania
to the north and southSpatial Extent
: 24.9389406999999999,
-33.9686420999999967 : 36.7868238000000005, -1.4968789000000000Spatial Reference System
: EPSG:4326 - WGS 84Temporal Coverage
: Unknown, presumed 2023Lineage
: Sourced from the shared data from the original
authors, presumed to be namely Justin Lucas. Lucas states that this
shapefile is sourced from OpenStreetMap.Distribution
: Shared by original authorsAbstract
: Shapefile containing all tracks and roads
that were within the study area aside from the two major highways; 176
total line features.Label
: secondary_roads.shpSpatial Coverage
: Land surrounding Makuyuni Wildlife
Corridor and the corridor itself.Spatial Extent
: 4002985.2809690786525607,
-417528.5057486270670779 : 4055363.5504161105491221,
-386600.4248493338818662Spatial Reference System
: EPSG:3857 - WGS 84 /
Pseudo-MercatorTemporal Coverage
: Unknown, presumed 2023Lineage
: Sourced from the shared data from the original
authors, presumed to be namely Justin Lucas. Lucas states that this
shapefile was sourced from OpenStreetMap.Distribution
: Shared by original authorsAt the time of the study preregistration, the authors had no prior knowledge of the geography of the study region with regards to the phenomena to be studied. This study is not related to prior studies by the authors. For each data source, the authors had engaged with the data only by inspecting the metadata and observing the layers in QGIS before beginning the reproduction.
There are several possible geographic threats to validity in a wildlife corridor analysis. The primary threat is boundary distortion, also known as edge effects. This issue may be addressed by using a buffer region around the study area, and assigning this buffer a high cost. This method will prevent unrealistic low-cost pathways from forming around the edge of the study area.
A second threat is space-time interactions. Wildlife movement patterns and landscape characteristics can change over time, especially over the course of a year, due to seasonal variations. Corridor analyses that do not take into account temporal considerations may not accurately capture these dynamics. This threat to validity will be difficult to address without additional temporal data that accounts for changing movement behaviors or landscape characteristics.
All necessary data transformations will be performed in the main analysis.
1a. Clip landcover by bounding box.
Inputs: landcover.tif, bounding_box.shp
Outputs: lc_study_clip (intermediate variable)
QGIS workflow step: Clip landcover by study site
Unplanned deviation: We decided not to clip landcover to initial_clip.shp before clipping to bounding_box, as this seemed to be an unnecessary step in the workflow. We also chose to clip landcover before reprojecting, as the original raster was too large to efficiently reproject.
## Reading layer `bounding_box' from data source
## `/Users/alanalutz/Documents/GitHub/Makayuni-Corridor/data/raw/public/bounding_box.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 7 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 833295.2 ymin: 9585634 xmax: 860913.9 ymax: 9617040
## Projected CRS: WGS 84 / UTM zone 36S
1b. Reproject landcover to EPSG 32736, and resample to 10m resolution.
Inputs: lc_study_clip
Outputs: lc_study_clip_reproj (intermediate variable)
QGIS workflow step: Warp (reproject) landcover
We chose a resolution of 10m for all inputs to the cost model, as specified in the original study manuscript. This resolution was not specified in the QGIS model, but instead set as a modifiable parameter.
1c. Reclassify landcover to cost values.
Inputs: lc_study_clip_reproj
Outputs: lc_reclass (intermediate variable), landcover_cost.grd (derived data output)
QGIS workflow step: Reclassify landcover by table
Unplanned deviation: We changed the reclassification table to intervals to account for the decimal values in the landcover raster. The original reclassification table in the QGIS model only reclassifed integers.
## class : RasterLayer
## dimensions : 3164, 2788, 8821232 (nrow, ncol, ncell)
## resolution : 10, 10 (x, y)
## extent : 833170.1, 861050.1, 9585513, 9617153 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs
## source : landcover_cost.grd
## names : wetlands
## values : 0.01, 1 (min, max)
## tmap mode set to interactive viewing
## stars object downsampled to 939 by 1065 cells. See tm_shape manual (argument raster.downsample)
Figure 1. Land cover cost raster.
2a. Reproject secondary roads to EPSG 32736.
Inputs: secondary_roads.shp
Outputs: secondary_roads_reproj (intermediate variable)
QGIS workflow step: Reproject secondary roads
## Reading layer `secondary_roads' from data source
## `/Users/alanalutz/Documents/GitHub/Makayuni-Corridor/data/raw/public/secondary_roads.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 176 features and 19 fields
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: 4002985 ymin: -417528.5 xmax: 4055364 ymax: -386600.4
## Projected CRS: WGS 84 / Pseudo-Mercator
2b. Rasterize secondary roads and resample to 10m resolution.
Inputs: secondary_roads_reproj
Outputs: secondary_roads_raster (intermediate variable), secondary_roads_raster.grd (derived data output)
QGIS workflow step: Rasterize secondary roads
## class : RasterLayer
## dimensions : 3164, 2788, 8821232 (nrow, ncol, ncell)
## resolution : 10, 10 (x, y)
## extent : 833170.1, 861050.1, 9585513, 9617153 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs
## source : secondary_roads_cost.grd
## names : layer
## values : 0, 1 (min, max)
## stars object downsampled to 939 by 1065 cells. See tm_shape manual (argument raster.downsample)
Figure 2. Secondary roads cost raster.
3a. Reproject major roads to EPSG 32736.
Inputs: major_roads_vector.shp
Outputs: major_roads_reproj (intermediate variable)
QGIS workflow step: Reproject major roads
## Reading layer `major_roads_vector' from data source
## `/Users/alanalutz/Documents/GitHub/Makayuni-Corridor/data/raw/public/major_roads_vector.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 4 features and 11 fields
## Geometry type: MULTILINESTRING
## Dimension: XY
## Bounding box: xmin: 24.93894 ymin: -33.96864 xmax: 36.78682 ymax: -1.496879
## Geodetic CRS: WGS 84
3b. Rasterize major roads.
Inputs: major_roads_reproj
Outputs: major_roads_raster (intermediate variable)
QGIS workflow step: Rasterize major roads
Unplanned deviation: We rasterized major roads before buffering, due to easy of functionality within the stars and raster packages. The original QGIS model buffered roads before rasterizing.
3c. Buffer major roads by 300m and resample to 10m resolution.
Inputs: major_roads_raster
Outputs: major_roads_buff (intermediate variable), buffered_roads_cost.grd (derived data output)
QGIS workflow step: Buffer major roads
## class : RasterLayer
## dimensions : 3164, 2788, 8821232 (nrow, ncol, ncell)
## resolution : 10, 10 (x, y)
## extent : 833170.1, 861050.1, 9585513, 9617153 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs
## source : buffered_roads_cost.grd
## names : layer
## values : 0, 1 (min, max)
## stars object downsampled to 939 by 1065 cells. See tm_shape manual (argument raster.downsample)
Figure 3. Major roads cost raster.
4a. Reproject rural settlements to EPSG 32736.
Inputs: bomas.shp
Outputs: rural_reproj (intermediate variable)
QGIS workflow step: Reproject rural
## Reading layer `Bomas' from data source
## `/Users/alanalutz/Documents/GitHub/Makayuni-Corridor/data/raw/public/Bomas.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 1525 features and 1 field
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 4007605 ymin: -412580.2 xmax: 4032955 ymax: -387445.8
## Projected CRS: WGS 84 / Pseudo-Mercator
4b. Reproject buildings to EPSG 32736.
Inputs: merged_buildings.shp
Outputs: buildings_reproj (intermediate variable)
QGIS workflow step: Reproject buildings
## Reading layer `merged_buildings' from data source
## `/Users/alanalutz/Documents/GitHub/Makayuni-Corridor/data/raw/public/merged_buildings.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 9775 features and 12 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 4007554 ymin: -414103 xmax: 4033053 ymax: -387288.1
## Projected CRS: WGS 84 / Pseudo-Mercator
4c. Merge rural settlements and buildings.
Inputs: rural_reproj, buildings_reproj
Outputs: merged (intermediate variable)
QGIS workflow step: Merge vector layers
4d. Rasterize merged buildings and resample to 10m resolution.
Inputs: merged
Outputs: merged_raster (intermediate variable)
QGIS workflow step: Rasterize buildings
## stars object downsampled to 939 by 1065 cells. See tm_shape manual (argument raster.downsample)
Figure 4. Visualize buildings.
4e. Create building proximity raster.
Inputs: merged_raster
Outputs: proximity (intermediate variable), building_proximity.grd (derived data output)
QGIS workflow step: Proximity raster
## class : RasterLayer
## dimensions : 3164, 2788, 8821232 (nrow, ncol, ncell)
## resolution : 10, 10 (x, y)
## extent : 833170.1, 861050.1, 9585513, 9617153 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs
## source : building_proximity.grd
## names : layer
## values : 0, 6935.142 (min, max)
## stars object downsampled to 939 by 1065 cells. See tm_shape manual (argument raster.downsample)
Figure 5. Raster of proximity to buildings.
4f. Reclassify building proximity raster into cost values.
Inputs: proximity
Outputs: prox_reclass (intermediate variable), building_proximity_cost.grd (derived data output)
QGIS workflow step: Reclassify proximity raster by table
## class : RasterLayer
## dimensions : 3164, 2788, 8821232 (nrow, ncol, ncell)
## resolution : 10, 10 (x, y)
## extent : 833170.1, 861050.1, 9585513, 9617153 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs
## source : building_proximity_cost.grd
## names : layer
## values : 0, 1 (min, max)
## stars object downsampled to 939 by 1065 cells. See tm_shape manual (argument raster.downsample)
## Warning: number of legend labels should be 5
Figure 6. Building proximity cost raster.
5. Reproject, rasterize, and resample buffer zone, and assign a high cost value.
Inputs: buffer_region.shp
Outputs: buffer_raster (intermediate variable), buffer_cost.grd (derived data output)
QGIS workflow step: Rasterize buffer zone
## Reading layer `buffer_region' from data source
## `/Users/alanalutz/Documents/GitHub/Makayuni-Corridor/data/raw/public/buffer_region.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 833295.2 ymin: 9585634 xmax: 860913.9 ymax: 9617040
## Projected CRS: WGS 84 / UTM zone 36S
## class : RasterLayer
## dimensions : 3164, 2788, 8821232 (nrow, ncol, ncell)
## resolution : 10, 10 (x, y)
## extent : 833170.1, 861050.1, 9585513, 9617153 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs
## source : buffer_cost.grd
## names : layer
## values : 0, 999 (min, max)
## stars object downsampled to 939 by 1065 cells. See tm_shape manual (argument raster.downsample)
Figure 7. Buffer zone cost.
6. Calculate cost surface.
Inputs: major_roads_buff, secondary_roads_raster, prox_reclass, lc_reclass, buffer_raster
Outputs: cost (intermediate variable), reproduced_cost_surface.grd (results figure)
QGIS workflow step: Raster calculator cost
Cost is calculated using the following formula: (A+(0.25*B)+C+D+E)/4
Where:
A = Major roads
B = Secondary roads
C = Proximity to buildings
D = Landcover
E = Buffer zone
## class : RasterLayer
## dimensions : 3164, 2788, 8821232 (nrow, ncol, ncell)
## resolution : 10, 10 (x, y)
## extent : 833170.1, 861050.1, 9585513, 9617153 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs
## source : reproduced_cost_surface.grd
## names : layer
## values : 0.0025, 250.2625 (min, max)
7. Compare to cost surface produced by the original study.
Inputs: cost, DD_cost_6.tif
Outputs: cost_difference.grd (results figure)
## class : RasterLayer
## dimensions : 3164, 2788, 8821232 (nrow, ncol, ncell)
## resolution : 10, 10 (x, y)
## extent : 833170.1, 861050.1, 9585513, 9617153 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs
## source : cost_difference.grd
## names : layer
## values : -249.8111, 182.5847 (min, max)
## stars object downsampled to 939 by 1065 cells. See tm_shape manual (argument raster.downsample)
Figure 8. Reproduced cost surface.
As expected, the highest costs (other than the buffer region) occur in areas around roads and settlements. Landcover does not appear to play as important of a role in movement cost.
## stars object downsampled to 939 by 1065 cells. See tm_shape manual (argument raster.downsample)
## Variable(s) "NA" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.
Figure 9. Differences between reproduced and original cost rasters.
For the most part, the reproduced cost surface is very similar to the original. Most of the error is less than 0.1 cost units, or quite close to zero. The main areas with error of a greater magnitude occur around small, isolated buildings. These buildings were likely too small to rasterize properly at the resolution of analysis.
In this study, we reproduced the multi-criteria movement cost surface from Lyimo et al’s Makayuni Wildlife Corridor analysis. We did this by replicating the author-provided QGIS model workflow as closely as possible in an R script, employing the ‘stars’ and ‘raster’ packages. However, some unplanned deviations from the QGIS model were necessary.
First of all, we decided not to clip the nationwide landcover raster to the “initial_clip” extent before clipping to “bounding_box”, as this seemed to be an unnecessary extra step in the workflow. This modification did mean that we needed to clip the landcover raster before reprojecting, as the original nationwide raster was too large to efficiently reproject. However, these steps should not have resulted in any changes in the final result.
Second, when reclassifying landcover into movement cost values, we modified the reclassification table into intervals to account for decimal values in the landcover raster. We are unsure as to why the landcover raster contains decimal values, but it was critical to incorporate them into the analysis. The original reclassification table in the QGIS model only reclassified integers. This modification resulted in some widespread “noise” in the landcover cost raster that is visible as minor error in the difference map.
Third, we rasterized major roads before buffering, due to easy of functionality within the stars and raster packages. The original QGIS model buffered roads before rasterizing. This modification created a small amount of error, the width of one pixel or so, on the edges of the major road buffers.
Overall, our reproduced cost surface is quite similar to the original, as illustrated in Figure 9. The main areas with error of a significant magnitude occur as proximity “bubbles” around small, isolated buildings, which are likely too small to rasterize properly at the 10m resolution of analysis. Interestingly, all of these buildings rasterized properly for the original authors in the QGIS model, which also used a 10m resolution. This discrepancy can only be explained by a difference in computational environment between QGIS and the ‘stars’ and ‘raster’ packages in R. This error is an example of a partition distortion, one type of geographic threat to validity in this field of research. In this instance of the Modifiable Areal Unit Problem (MAUP), a coarse spatial resolution obscures finer-scale phenomena.
We found the original study to be reasonably understandable and reproducible. Some additional documentation and metadata would have been useful (for example, it was difficult to discern that the resolution of analysis was 10m without reading the manuscript), but overall, the QGIS model made the inputs, outputs, operations, parameters, and sequence quite clear. The primary challenge in this reproduction study was performing the necessary operations in the R computational environment. After struggling to perform all operations within a single package, we found it necessary to convert between the ‘stars’ and ‘raster’ packages, which introduced additional challenges. A more streamlined and reproducible study would maintain all data within a single package format.
We reproduced the movement cost surface from Lyimo et al’s Makayuni Wildlife Corridor analysis with some degree of success, converting the original QGIS model into an R script. The primary source of error was an unresolvable difference in computational environment that resulted in some buildings failing to rasterize properly at the resolution of analysis, creating isolated “bubbles” of error in proximity to these buildings. This discrepancy demonstrates the impacts of partition distortion, in particular the Modifiable Areal Unit Problem, on reproduction analyses in different computational environments. Otherwise, we found the original study to be relatively simple to follow, illustrating the reproducibility of a QGIS model given the proper documentation and metadata.
The authors completed the preregistration for this report before attempting the reproduction study.
This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](DOI:%5B10.17605/OSF.IO/W29MQ){.uri}