Authors

1 College of African Wildlife Management, Mweka. P.O Box 3031, Moshi, Tanzania

2 Tanzania People and Wildlife, P.O Box 11306, Arusha, Tanzania

3 Tanzania Wildlife Research Institute, P.O Box 661, Arusha, Tanzania

4 Middlebury College, Middlebury, Vermont 05753, USA

* Corresponding author and creator

Abstract

This study is a reproduction of unpublished research by Lyimo et al (2023), titled: Makuyuni Wildlife Corridor: Analysis of the Effects of Socioecological Interactions and Changing Land Use on Movement Patterns of Large Mammal Species. The original study evaluated the connectivity of Makuyuni Wildlife Corridor, a stretch of unprotected land between Tarangire National Park and Essmingor National Forest Reserve in Tanzania. We reproduce the first part of the study, a multifactor analysis model that approximates a movement cost surface for the corridor. The study was originally conducted using the QGIS Model Builder, and we reproduce the workflow in R. The goal of the study is to reproduce the authors’ map displaying the minimum cost of movement through each point in the study area, compare the reproduced map to the original map, and evaluate why the reproduction differs from the original. Our reproduction is relatively successful, with some minor discrepancies attributed to differences in computational environment.

Study metadata

Study design

This study is a reproduction of an original observational study that created a cost surface of wildlife movement in Makayuni Wildlife Corridor. We aim to reproduce the original study’s cost surface map in the R computational environment rather than the QGIS Model Builder environment used by the original authors. The null hypothesis is that there are no differences between the reproduced cost surface map and the original cost surface map. The alternative hypothesis is that there are differences between the reproduced cost surface map and the original cost surface map. We test this hypothesis by subtracting the reproduced map from the original to identify areas of difference.

Materials and procedure

Computational environment

The original study was conducted using QGIS and Saga. The reproduction study uses R, including the sf package for vector analysis, the stars and raster packages for raster analysis, and the tmap package for cartography.

The reproduction was conducted on a 2019 MacBook Pro, running under macOS Sonoma 14.1.2. We use R version 4.3.2 and the following versions of each package:

tidyverse_2.0.0
here_1.0.1
sf_1.0-14
stars_0.6-4
raster_3.6-26
groundhog_3.1.2
conflicted_1.2.0

We use the groundhog package to maintain a consistent computational environment for the sake of reproducibility.

Data and variables

Secondary data sources for the study are to include land cover, buffer region, merged buildings, initial clip, bounding box, secondary roads, bomas, and major roads.

Each of the next subsections describes one data source.

Land Cover

Note: This file size of this raster is too large to be uploaded to GitHub. It must be downloaded from the original authors, who can share the file via a Google Drive folder.

  • Abstract: Raster of classified landcover for the entire country of Tanzania.
  • Label: landcover.tif
  • Spatial Coverage: Tanzania
  • Spatial Resolution: 4.78m
  • Spatial Reference System: EPSG 3857
  • Temporal Coverage: 2023
  • Lineage: Song et al 2023
  • Distribution: Shared by original authors
  • Bands: 1
    • Label: Landcover
    • Definition: Classified landcover type using satellite imagery
    • Type: integer
    • Domain: 0-8
    • Values:
      • 1 (orange): cropland
      • 2 (dark green): forest/dense tree
      • 3 (light green): grassland
      • 4 (green): shrubland
      • 5 (blue): water
      • 6 (gray): built-up
      • 7 (yellow): bareland
      • 8 (teal): wetlands

Buffer Region

  • Abstract: Shapefile of buffer region used to avoid edge effects
  • Label: buffer_region.shp
  • Spatial Coverage: 833295.2141958434367552, 9585634.2357019279152155 : 860913.8547548535279930, 9617039.8727052193135023
  • Spatial Reference System: EPSG 32736
  • Lineage: Sourced from the shared data from the original authors, presumed to be namely Justin Lucas.
  • Distribution: Shared by original authors

Merged Buildings

  • Abstract: Shapefile containing polygon features of Microsoft and OSM buildings combined
  • Label: merged_buildings.shp
  • Spatial Coverage: 4007553.8440031828358769, -414102.9735587749746628 : 4033053.2097658971324563, -387288.0955167215433903
  • Spatial Reference System: EPSG 3857
  • Lineage: Sourced from the shared data from the original authors, presumed to be namely Justin Lucas.
  • Distribution: Shared by original authors

Initial Clip

  • Abstract: Shapefile used to initially clip landcover map, larger than study area and bounding box
  • Label: initial clip.shp
  • Spatial Coverage: 830767.7591739012859762, 9581072.9865449033677578 : 862007.9419027733383700, 9619914.2523989714682102
  • Spatial Reference System: EPSG 32736
  • Lineage: Sourced from the shared data from the original authors, presumed to be namely Justin Lucas.
  • Distribution: Shared by original authors

Bounding Box

  • Abstract: Shapefile which was used to create a buffer to prevent edge effects
  • Label: bounding_box.shp
  • Spatial Coverage: Land surrounding Makuyuni Wildlife Corridor and the corridor itself
  • Spatial Extent: 833295.2141958434367552, 9585634.2357019279152155 : 860913.8547548535279930, 9617039.8727052193135023
  • Spatial Reference System: EPSG:32736 - WGS 84 / UTM zone 36S
  • Temporal Coverage: Unknown, presumed 2023
  • Lineage: Sourced from the shared data from the original authors, presumed to be namely Justin Lucas.
  • Distribution: Shared by original authors

Bomas

  • Abstract: Shapefile containing polygon features of bomas (rural settlements) within the wildlife corridor
  • Label: bomas.shp
  • Spatial Coverage: Within the wildlife corridor
  • Spatial Extent: 4007605.1797135984525084, -412580.2386067652842030 : 4032954.7106412472203374, -387445.8448925596894696
  • Spatial Reference System: EPSG:3857 - WGS 84 / Pseudo-Mercator
  • Temporal Coverage: Unknown, presumed 2023
  • Lineage: Sourced from the shared data from the original authors, presumed to be namely Justin Lucas.
  • Distribution: Shared by original authors

Major Roads

  • Abstract: Shapefile containing line features of two highways that run through the study area
  • Label: major_roads_vector.shp
  • Spatial Coverage: The entirety of the extent of the highways, which extend far past the study site and outside of Tanzania to the north and south
  • Spatial Extent: 24.9389406999999999, -33.9686420999999967 : 36.7868238000000005, -1.4968789000000000
  • Spatial Reference System: EPSG:4326 - WGS 84
  • Temporal Coverage: Unknown, presumed 2023
  • Lineage: Sourced from the shared data from the original authors, presumed to be namely Justin Lucas. Lucas states that this shapefile is sourced from OpenStreetMap.
  • Distribution: Shared by original authors

Secondary Roads

  • Abstract: Shapefile containing all tracks and roads that were within the study area aside from the two major highways; 176 total line features.
  • Label: secondary_roads.shp
  • Spatial Coverage: Land surrounding Makuyuni Wildlife Corridor and the corridor itself.
  • Spatial Extent: 4002985.2809690786525607, -417528.5057486270670779 : 4055363.5504161105491221, -386600.4248493338818662
  • Spatial Reference System: EPSG:3857 - WGS 84 / Pseudo-Mercator
  • Temporal Coverage: Unknown, presumed 2023
  • Lineage: Sourced from the shared data from the original authors, presumed to be namely Justin Lucas. Lucas states that this shapefile was sourced from OpenStreetMap.
  • Distribution: Shared by original authors

Prior observations

At the time of the study preregistration, the authors had no prior knowledge of the geography of the study region with regards to the phenomena to be studied. This study is not related to prior studies by the authors. For each data source, the authors had engaged with the data only by inspecting the metadata and observing the layers in QGIS before beginning the reproduction.

Bias and threats to validity

There are several possible geographic threats to validity in a wildlife corridor analysis. The primary threat is boundary distortion, also known as edge effects. This issue may be addressed by using a buffer region around the study area, and assigning this buffer a high cost. This method will prevent unrealistic low-cost pathways from forming around the edge of the study area.

A second threat is space-time interactions. Wildlife movement patterns and landscape characteristics can change over time, especially over the course of a year, due to seasonal variations. Corridor analyses that do not take into account temporal considerations may not accurately capture these dynamics. This threat to validity will be difficult to address without additional temporal data that accounts for changing movement behaviors or landscape characteristics.

Data transformations

All necessary data transformations will be performed in the main analysis.

Analysis

1a. Clip landcover by bounding box.

Inputs: landcover.tif, bounding_box.shp

Outputs: lc_study_clip (intermediate variable)

QGIS workflow step: Clip landcover by study site

Unplanned deviation: We decided not to clip landcover to initial_clip.shp before clipping to bounding_box, as this seemed to be an unnecessary step in the workflow. We also chose to clip landcover before reprojecting, as the original raster was too large to efficiently reproject.

## Reading layer `bounding_box' from data source 
##   `/Users/alanalutz/Documents/GitHub/Makayuni-Corridor/data/raw/public/bounding_box.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 7 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 833295.2 ymin: 9585634 xmax: 860913.9 ymax: 9617040
## Projected CRS: WGS 84 / UTM zone 36S

1b. Reproject landcover to EPSG 32736, and resample to 10m resolution.

Inputs: lc_study_clip

Outputs: lc_study_clip_reproj (intermediate variable)

QGIS workflow step: Warp (reproject) landcover

We chose a resolution of 10m for all inputs to the cost model, as specified in the original study manuscript. This resolution was not specified in the QGIS model, but instead set as a modifiable parameter.

1c. Reclassify landcover to cost values.

Inputs: lc_study_clip_reproj

Outputs: lc_reclass (intermediate variable), landcover_cost.grd (derived data output)

QGIS workflow step: Reclassify landcover by table

Unplanned deviation: We changed the reclassification table to intervals to account for the decimal values in the landcover raster. The original reclassification table in the QGIS model only reclassifed integers.

## class      : RasterLayer 
## dimensions : 3164, 2788, 8821232  (nrow, ncol, ncell)
## resolution : 10, 10  (x, y)
## extent     : 833170.1, 861050.1, 9585513, 9617153  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs 
## source     : landcover_cost.grd 
## names      : wetlands 
## values     : 0.01, 1  (min, max)
## tmap mode set to interactive viewing
## stars object downsampled to 939 by 1065 cells. See tm_shape manual (argument raster.downsample)

Figure 1. Land cover cost raster.

2a. Reproject secondary roads to EPSG 32736.

Inputs: secondary_roads.shp

Outputs: secondary_roads_reproj (intermediate variable)

QGIS workflow step: Reproject secondary roads

## Reading layer `secondary_roads' from data source 
##   `/Users/alanalutz/Documents/GitHub/Makayuni-Corridor/data/raw/public/secondary_roads.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 176 features and 19 fields
## Geometry type: LINESTRING
## Dimension:     XY
## Bounding box:  xmin: 4002985 ymin: -417528.5 xmax: 4055364 ymax: -386600.4
## Projected CRS: WGS 84 / Pseudo-Mercator

2b. Rasterize secondary roads and resample to 10m resolution.

Inputs: secondary_roads_reproj

Outputs: secondary_roads_raster (intermediate variable), secondary_roads_raster.grd (derived data output)

QGIS workflow step: Rasterize secondary roads

## class      : RasterLayer 
## dimensions : 3164, 2788, 8821232  (nrow, ncol, ncell)
## resolution : 10, 10  (x, y)
## extent     : 833170.1, 861050.1, 9585513, 9617153  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs 
## source     : secondary_roads_cost.grd 
## names      : layer 
## values     : 0, 1  (min, max)
## stars object downsampled to 939 by 1065 cells. See tm_shape manual (argument raster.downsample)

Figure 2. Secondary roads cost raster.

3a. Reproject major roads to EPSG 32736.

Inputs: major_roads_vector.shp

Outputs: major_roads_reproj (intermediate variable)

QGIS workflow step: Reproject major roads

## Reading layer `major_roads_vector' from data source 
##   `/Users/alanalutz/Documents/GitHub/Makayuni-Corridor/data/raw/public/major_roads_vector.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 4 features and 11 fields
## Geometry type: MULTILINESTRING
## Dimension:     XY
## Bounding box:  xmin: 24.93894 ymin: -33.96864 xmax: 36.78682 ymax: -1.496879
## Geodetic CRS:  WGS 84

3b. Rasterize major roads.

Inputs: major_roads_reproj

Outputs: major_roads_raster (intermediate variable)

QGIS workflow step: Rasterize major roads

Unplanned deviation: We rasterized major roads before buffering, due to easy of functionality within the stars and raster packages. The original QGIS model buffered roads before rasterizing.

3c. Buffer major roads by 300m and resample to 10m resolution.

Inputs: major_roads_raster

Outputs: major_roads_buff (intermediate variable), buffered_roads_cost.grd (derived data output)

QGIS workflow step: Buffer major roads

## class      : RasterLayer 
## dimensions : 3164, 2788, 8821232  (nrow, ncol, ncell)
## resolution : 10, 10  (x, y)
## extent     : 833170.1, 861050.1, 9585513, 9617153  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs 
## source     : buffered_roads_cost.grd 
## names      : layer 
## values     : 0, 1  (min, max)
## stars object downsampled to 939 by 1065 cells. See tm_shape manual (argument raster.downsample)

Figure 3. Major roads cost raster.

4a. Reproject rural settlements to EPSG 32736.

Inputs: bomas.shp

Outputs: rural_reproj (intermediate variable)

QGIS workflow step: Reproject rural

## Reading layer `Bomas' from data source 
##   `/Users/alanalutz/Documents/GitHub/Makayuni-Corridor/data/raw/public/Bomas.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1525 features and 1 field
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 4007605 ymin: -412580.2 xmax: 4032955 ymax: -387445.8
## Projected CRS: WGS 84 / Pseudo-Mercator

4b. Reproject buildings to EPSG 32736.

Inputs: merged_buildings.shp

Outputs: buildings_reproj (intermediate variable)

QGIS workflow step: Reproject buildings

## Reading layer `merged_buildings' from data source 
##   `/Users/alanalutz/Documents/GitHub/Makayuni-Corridor/data/raw/public/merged_buildings.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 9775 features and 12 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 4007554 ymin: -414103 xmax: 4033053 ymax: -387288.1
## Projected CRS: WGS 84 / Pseudo-Mercator

4c. Merge rural settlements and buildings.

Inputs: rural_reproj, buildings_reproj

Outputs: merged (intermediate variable)

QGIS workflow step: Merge vector layers

4d. Rasterize merged buildings and resample to 10m resolution.

Inputs: merged

Outputs: merged_raster (intermediate variable)

QGIS workflow step: Rasterize buildings

## stars object downsampled to 939 by 1065 cells. See tm_shape manual (argument raster.downsample)

Figure 4. Visualize buildings.

4e. Create building proximity raster.

Inputs: merged_raster

Outputs: proximity (intermediate variable), building_proximity.grd (derived data output)

QGIS workflow step: Proximity raster

## class      : RasterLayer 
## dimensions : 3164, 2788, 8821232  (nrow, ncol, ncell)
## resolution : 10, 10  (x, y)
## extent     : 833170.1, 861050.1, 9585513, 9617153  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs 
## source     : building_proximity.grd 
## names      : layer 
## values     : 0, 6935.142  (min, max)
## stars object downsampled to 939 by 1065 cells. See tm_shape manual (argument raster.downsample)

Figure 5. Raster of proximity to buildings.

4f. Reclassify building proximity raster into cost values.

Inputs: proximity

Outputs: prox_reclass (intermediate variable), building_proximity_cost.grd (derived data output)

QGIS workflow step: Reclassify proximity raster by table

## class      : RasterLayer 
## dimensions : 3164, 2788, 8821232  (nrow, ncol, ncell)
## resolution : 10, 10  (x, y)
## extent     : 833170.1, 861050.1, 9585513, 9617153  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs 
## source     : building_proximity_cost.grd 
## names      : layer 
## values     : 0, 1  (min, max)
## stars object downsampled to 939 by 1065 cells. See tm_shape manual (argument raster.downsample)
## Warning: number of legend labels should be 5

Figure 6. Building proximity cost raster.

5. Reproject, rasterize, and resample buffer zone, and assign a high cost value.

Inputs: buffer_region.shp

Outputs: buffer_raster (intermediate variable), buffer_cost.grd (derived data output)

QGIS workflow step: Rasterize buffer zone

## Reading layer `buffer_region' from data source 
##   `/Users/alanalutz/Documents/GitHub/Makayuni-Corridor/data/raw/public/buffer_region.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 833295.2 ymin: 9585634 xmax: 860913.9 ymax: 9617040
## Projected CRS: WGS 84 / UTM zone 36S
## class      : RasterLayer 
## dimensions : 3164, 2788, 8821232  (nrow, ncol, ncell)
## resolution : 10, 10  (x, y)
## extent     : 833170.1, 861050.1, 9585513, 9617153  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs 
## source     : buffer_cost.grd 
## names      : layer 
## values     : 0, 999  (min, max)
## stars object downsampled to 939 by 1065 cells. See tm_shape manual (argument raster.downsample)

Figure 7. Buffer zone cost.

6. Calculate cost surface.

Inputs: major_roads_buff, secondary_roads_raster, prox_reclass, lc_reclass, buffer_raster

Outputs: cost (intermediate variable), reproduced_cost_surface.grd (results figure)

QGIS workflow step: Raster calculator cost

Cost is calculated using the following formula: (A+(0.25*B)+C+D+E)/4

Where:

A = Major roads

B = Secondary roads

C = Proximity to buildings

D = Landcover

E = Buffer zone

## class      : RasterLayer 
## dimensions : 3164, 2788, 8821232  (nrow, ncol, ncell)
## resolution : 10, 10  (x, y)
## extent     : 833170.1, 861050.1, 9585513, 9617153  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs 
## source     : reproduced_cost_surface.grd 
## names      : layer 
## values     : 0.0025, 250.2625  (min, max)

7. Compare to cost surface produced by the original study.

Inputs: cost, DD_cost_6.tif

Outputs: cost_difference.grd (results figure)

## class      : RasterLayer 
## dimensions : 3164, 2788, 8821232  (nrow, ncol, ncell)
## resolution : 10, 10  (x, y)
## extent     : 833170.1, 861050.1, 9585513, 9617153  (xmin, xmax, ymin, ymax)
## crs        : +proj=utm +zone=36 +south +datum=WGS84 +units=m +no_defs 
## source     : cost_difference.grd 
## names      : layer 
## values     : -249.8111, 182.5847  (min, max)

Results

## stars object downsampled to 939 by 1065 cells. See tm_shape manual (argument raster.downsample)

Figure 8. Reproduced cost surface.

As expected, the highest costs (other than the buffer region) occur in areas around roads and settlements. Landcover does not appear to play as important of a role in movement cost.

## stars object downsampled to 939 by 1065 cells. See tm_shape manual (argument raster.downsample)
## Variable(s) "NA" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

Figure 9. Differences between reproduced and original cost rasters.

For the most part, the reproduced cost surface is very similar to the original. Most of the error is less than 0.1 cost units, or quite close to zero. The main areas with error of a greater magnitude occur around small, isolated buildings. These buildings were likely too small to rasterize properly at the resolution of analysis.

Discussion

In this study, we reproduced the multi-criteria movement cost surface from Lyimo et al’s Makayuni Wildlife Corridor analysis. We did this by replicating the author-provided QGIS model workflow as closely as possible in an R script, employing the ‘stars’ and ‘raster’ packages. However, some unplanned deviations from the QGIS model were necessary.

First of all, we decided not to clip the nationwide landcover raster to the “initial_clip” extent before clipping to “bounding_box”, as this seemed to be an unnecessary extra step in the workflow. This modification did mean that we needed to clip the landcover raster before reprojecting, as the original nationwide raster was too large to efficiently reproject. However, these steps should not have resulted in any changes in the final result.

Second, when reclassifying landcover into movement cost values, we modified the reclassification table into intervals to account for decimal values in the landcover raster. We are unsure as to why the landcover raster contains decimal values, but it was critical to incorporate them into the analysis. The original reclassification table in the QGIS model only reclassified integers. This modification resulted in some widespread “noise” in the landcover cost raster that is visible as minor error in the difference map.

Third, we rasterized major roads before buffering, due to easy of functionality within the stars and raster packages. The original QGIS model buffered roads before rasterizing. This modification created a small amount of error, the width of one pixel or so, on the edges of the major road buffers.

Overall, our reproduced cost surface is quite similar to the original, as illustrated in Figure 9. The main areas with error of a significant magnitude occur as proximity “bubbles” around small, isolated buildings, which are likely too small to rasterize properly at the 10m resolution of analysis. Interestingly, all of these buildings rasterized properly for the original authors in the QGIS model, which also used a 10m resolution. This discrepancy can only be explained by a difference in computational environment between QGIS and the ‘stars’ and ‘raster’ packages in R. This error is an example of a partition distortion, one type of geographic threat to validity in this field of research. In this instance of the Modifiable Areal Unit Problem (MAUP), a coarse spatial resolution obscures finer-scale phenomena.

We found the original study to be reasonably understandable and reproducible. Some additional documentation and metadata would have been useful (for example, it was difficult to discern that the resolution of analysis was 10m without reading the manuscript), but overall, the QGIS model made the inputs, outputs, operations, parameters, and sequence quite clear. The primary challenge in this reproduction study was performing the necessary operations in the R computational environment. After struggling to perform all operations within a single package, we found it necessary to convert between the ‘stars’ and ‘raster’ packages, which introduced additional challenges. A more streamlined and reproducible study would maintain all data within a single package format.

Conclusion

We reproduced the movement cost surface from Lyimo et al’s Makayuni Wildlife Corridor analysis with some degree of success, converting the original QGIS model into an R script. The primary source of error was an unresolvable difference in computational environment that resulted in some buildings failing to rasterize properly at the resolution of analysis, creating isolated “bubbles” of error in proximity to these buildings. This discrepancy demonstrates the impacts of partition distortion, in particular the Modifiable Areal Unit Problem, on reproduction analyses in different computational environments. Otherwise, we found the original study to be relatively simple to follow, illustrating the reproducibility of a QGIS model given the proper documentation and metadata.

Integrity Statement

The authors completed the preregistration for this report before attempting the reproduction study.

Acknowledgements

This report is based upon the template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences, DOI:[10.17605/OSF.IO/W29MQ](DOI:%5B10.17605/OSF.IO/W29MQ){.uri}