-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use exactextractr as single engine for zonal statistics #334
Comments
we should discuss this. |
Happy to discuss pros and cons! In these kind of discussions, from my POV, it is always a good idea try to provide reproducible evidence for one's claims (I try to something similar for the question on the effect of tiling here). Do you have access to those findings you mentioned, what do they tell and, more importantly, can we reproduce and do they reflect our use case scenario? If we were to re-work package internals to use a single engine, this could easily be exchanged by another implementation at a later point in time, if necessary. My main issue with the current approach is that it seems inefficient (needs to be proved), it is confusing both to users and developers and we produce different numerical results depending on the "engine" used. To put some of the statements you made about maintenance into perspective: |
To provide some initial evidence (we can always make the examples more complex) here is a reprex where I see consistently faster performance of EDIT history:
library(sf)
#> Linking to GEOS 3.12.2, GDAL 3.9.1, PROJ 9.4.1; sf_use_s2() is TRUE
library(terra)
#> terra 1.7.78
library(microbenchmark)
library(exactextractr)
stats <- c("min", "mean", "sum", "max")
nlayer <- 10
r_low_res <- rast(resolution = c(0.25, 0.25))
r_low_res[] <- runif(ncell(r_low_res))
r_low_res <- do.call(c, lapply(1:nlayer, function(x) r_low_res))
names(r_low_res) <- paste0("var_", 1:nlayer)
(r_low_res)
#> class : SpatRaster
#> dimensions : 720, 1440, 10 (nrow, ncol, nlyr)
#> resolution : 0.25, 0.25 (x, y)
#> extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84)
#> source(s) : memory
#> names : var_1, var_2, var_3, var_4, var_5, var_6, ...
#> min values : 4.440080e-07, 4.440080e-07, 4.440080e-07, 4.440080e-07, 4.440080e-07, 4.440080e-07, ...
#> max values : 9.999999e-01, 9.999999e-01, 9.999999e-01, 9.999999e-01, 9.999999e-01, 9.999999e-01, ...
r_high_res <- rast(resolution = c(0.025, 0.025))
r_high_res[] <- runif(ncell(r_high_res))
r_high_res <- do.call(c, lapply(1:nlayer, function(x) r_high_res))
names(r_high_res) <- paste0("var_", 1:nlayer)
(r_high_res)
#> class : SpatRaster
#> dimensions : 7200, 14400, 10 (nrow, ncol, nlyr)
#> resolution : 0.025, 0.025 (x, y)
#> extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (CRS84) (OGC:CRS84)
#> source(s) : memory
#> names : var_1, var_2, var_3, var_4, var_5, var_6, ...
#> min values : 1.164153e-09, 1.164153e-09, 1.164153e-09, 1.164153e-09, 1.164153e-09, 1.164153e-09, ...
#> max values : 1.000000e+00, 1.000000e+00, 1.000000e+00, 1.000000e+00, 1.000000e+00, 1.000000e+00, ...
pts <- st_sample(st_bbox(r_low_res), 100)
pts <- st_as_sf(pts)
pol <- st_buffer(pts, dist = 100000)
# low res
microbenchmark(
"zonal" = lapply(stats, function(stat) zonal(r_low_res, vect(pol), fun = stat)),
"extract" = extract(r_low_res, vect(pol), stats),
"exact" = exact_extract(r_low_res, pol, stats, progress = F),
times = 10
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> zonal 434.20002 436.16781 438.7373 438.3603 441.76197 443.8704 10 a
#> extract 108.40040 109.34919 112.8417 109.6444 110.24886 142.0795 10 b
#> exact 41.61539 41.89287 55.4624 42.0558 47.85883 162.9634 10 c
microbenchmark(
"zonal" = lapply(stats, function(stat) zonal(r_high_res, vect(pol), fun = stat)),
"extract" = extract(r_high_res, vect(pol), stats),
"exact" = exact_extract(r_high_res, pol, stats, progress = F),
times = 10
)
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> zonal 6739.114 7457.8483 7418.5185 7517.2377 7580.2243 7608.3937 10 a
#> extract 1660.335 1680.2645 1803.5384 1817.4242 1883.2219 1935.5932 10 b
#> exact 320.236 385.8399 427.8119 439.6597 498.2279 522.5295 10 c Created on 2024-09-03 with reprex v2.1.0 |
@Jo-Schie: I just remembered that our implementation of the GFW-based indicators does not work without mapme.biodiversity/R/calc_treecover_area.R Lines 51 to 55 in f4ff388
|
Currently, we supply and maintain three different engines for the extraction of zonal statistics. This situation is not ideal, because they differ in performance, the numerical results are not equal and they also differ in the set of summary statistics they support.
exactextractr
works with a very performant algorithm to calculate exact pixel coverage, allows a large set of summary statistics which are easily weighted by area or custom weight rasters and is already a suggested dependency. I thus suggest to re-implement the engine behavior to useexactextractr
as the sole engine and move it to Imports. Theengine
argument will be deprecated and informative warning messages will be included in the next release, before it will be removed eventually. The work on this already started and is available on therework-engine
branch.The text was updated successfully, but these errors were encountered: