
envfetch: Fetch environmental data over spatio-temporal geometries
envfetch.Rd
envfetch
extracts environmental data of spatio-temporal inputs from local
raster datasets or Google Earth Engine.
The time or time range for the extraction can vary between geometries.
The function includes features for caching, memory management, and data
summarisation. For extracting from multiple data sources, ensure any custom
parameters for r
, bands
, temporal_fun
or spatial_fun
are specified
appropriately.
Usage
envfetch(
x,
r = NULL,
bands = NULL,
temporal_fun = "mean",
spatial_fun = "mean",
scale = NULL,
max_feature_collection_size = 5000,
use_cache = TRUE,
out_dir = file.path("./output/"),
out_filename = NA,
overwrite = TRUE,
cache_dir = file.path(out_dir, "cache/"),
cache_files = NA,
time_column_name = NULL,
.time_rep = NA,
init_gee = TRUE,
...
)
Arguments
- x
A
sf
collection with a geometry column and a time column (aDate
,datetime
or a range of datetimes as alubridate::interval
).- r
Specifies the data source: either a local raster file path (which can include subdatasets) or a Google Earth Engine collection name. For multiple sources, provide a list and also specify the
bands
andtemporal_fun
, and optionallytime_column_name
, parameters accordingly.- bands
Numeric or character vector specifying band numbers or names to extract. Use
NULL
to extract all bands. For multiple sources, provide a list of vectors.- temporal_fun
Function or string used to summarize data for each time interval. Is ignored if time is a date or datetime. Default is
mean(x, na.rm=TRUE)
. the string'last'
returns the value before the start of the time interval,'next'
returns the value after the start of the time interval and'closest'
finds the closest value to the start of the time interval. For multiple sources, provide a list of functions or strings.- spatial_fun
Function or string used to summarize data spatially (if
x
is a polygon). Default ('mean'
) for local files ismean(x, na.rm=TRUE)
and for google earth engine isrgee::ee$Reducer$mean()
. For local files, useNULL
to not summarise spatially before summarising temporally. If you are extracting from google earth engine, you must specify a google earth engine reducerrgee::ee$Reducer
function (e.g.rgee::ee$Reducer$sum()
). See https://r-spatial.github.io/rgee/reference/ee_extract.html". For different behaviour with multiple sources, provide a list of functions or strings.- scale
Numeric vector specifying scales to aggregate rasters to before extraction. Use
NULL
for no aggregation. For multiple sources, provide a list of vectors.- max_feature_collection_size
An integer representing the maximum number of features (rows) to include in each chunk when splitting the dataset for efficient memory use on Google Earth Engine's end. Default is 5000.
- use_cache
Logical flag indicating whether to use caching. Default is
TRUE
.- out_dir
Output directory for files. Default is
./output/
.- out_filename
Name for the output file, defaulting to a timestamped
.gpkg
file.- overwrite
Logical flag to overwrite existing output files. Default is
TRUE
.- cache_dir
Directory for caching files. Default is
./output/cache/
.- cache_files
Paths to cached files. Specify these if the cache system hasn't automatically detected your cache. Is ignored if use_cache = FALSE.
- time_column_name
Name of the time column in
x
. UseNULL
to auto-select a time column of typelubridate::interval
. Default is NULL.- .time_rep
Specifies repeating time intervals for extraction. Default is
NA
.- init_gee
A logical indicating whether to initialise Google Earth Engine within the function. Default is TRUE.
- ...
Arguments passed on to
extract_over_time
,fetch
,extract_over_space
,extract_gee
subds
positive integer or character to select a sub-dataset to extract from. If zero or "", all sub-datasets are extracted.
spatial_extraction_fun
A function used to extract points spatially for each time slice of the raster. Default is the default implementation of
extract_over_space
(extracts themean
of geometries within rasters, removing NAs).time_buffer
Time buffer used to adjust the time interval for data extraction. The function always uses the time before and after the interval to prevent errors when summarising the earliest and latest times. Default is 0 days.
debug
If TRUE, pauses the function and displays a plot for each extracted point. This is useful for debugging unexpected extracted values. Default is FALSE.
override_terraOptions
If TRUE, overrides terra's default terraOptions with those specified in the envfetch's package. Default is TRUE.
is_vectorised_summarisation_function
Whether the summarisation is vectorised (like rowSums or rowMeans). Is only necessary to be TRUE if the row-wise vectorised summarisation function has not been automatically detected (does not use rowSums or rowMeans).
verbose
Whether to print messages to the console. Defaults to TRUE.
trim_raster
Whether to trim the raster to time bounds as a performance optimisation. Defaults to TRUE.
subset_raster_indices
Whether to subset raster by time indices as a performance optimisation. Defaults to TRUE.
batch_size
The maximum number of rows or geometries to extract and summarise at a time. Each batch will be cached to continue extraction in case of interruptions. Larger batch sizes may result in overuse of rgee on the server-side and hangs. Set
batch_size
to1
,NA
or<1
for no batching. Usefuns_to_use_batch_size
to define what functions batch_size will be used with.funs_to_use_batch_size
A vector with the names of functions you want to use batch_size for. Batch size is useful for some functions (rgee:
'extract_gee'
) but not others (local:'extract_over_time'
). Defaults toc('extract_gee')
.do_initial_sort
Whether to initially sort the unique input data
x
by space (ifuse_space_in_initial_sort
isTRUE
) and time for efficiency during later extraction processes. Defaults to TRUE.use_space_in_initial_sort
Whether to initially sort the unique input data
x
by space in addition to time for efficiency during later extraction processes. Defaults to FALSE.chunk
Logical. If
TRUE
, the raster will be split into chunks based on available RAM and processed chunk by chunk. IfFALSE
, the raster will be processed as a whole. Default isTRUE
.na.rm
Whether to remove NA values when summarising with the
spatial_fun
function.extraction_fun
The extraction function to use. Default is
terra::extract
.max_ram_frac_per_chunk
The maximum fraction of available memory to use for each extraction chunk.
collection_name
A character string representing the Google Earth Engine image collection from which to extract data.
lazy
A logical indicating whether to download Google Earth Engine data lazily with future::sequential objects to evaluate the task in the future. Defaults to FALSE.
initialise_gee
A logical indicating whether to initialise Google Earth Engine within the function. Default is TRUE.
use_gcs
A logical indicating whether to use Google Cloud Storage for larger requests. Default is FALSE.
use_drive
A logical indicating whether to use Google Drive for larger requests. Default is FALSE.
max_chunk_time_day_range
An string representing the maximum number of time units to include in each time chunk when splitting the dataset for efficient memory use on Google Earth Engine's end. Default is '3 months'.
ee_reducer_fun
A Google Earth Engine reducer function representing the function used to aggregate the data extracted from each image. Default is rgee::ee$Reducer$mean().
Value
An enhanced version of the input sf
collection, x
, augmented with the
extracted environmental data.
Details
envfetch
serves as a high-level wrapper for specific data extraction
methods:
For local raster files, it employs
extract_over_time
with datetime ranges andstars::st_extract
with single datetimes.For Google Earth Engine collections, it uses
extract_gee
.
It also supports caching, allowing you to avoid repeated calculations and resume work after interruptions.
See also
Other relevant functions, used internally by envfetch
: fetch
,
extract_gee
, extract_over_time
Examples
if (FALSE) { # \dontrun{
# local raster file path example
extracted_data <- envfetch(x = my_data, r = "/path/to/local/raster/file.tif")
# loaded raster object example
library(terra)
r <- rast("/path/to/local/raster/file.tif")
extracted_data <- envfetch(x = my_data, r = r)
# Google Earth Engine example
extracted_gee_data <- envfetch(
x = my_data,
r = "GEE_COLLECTION_NAME",
bands = c('BAND_NAME_1', 'BAND_NAME_2'),
time_column_name = "time"
)
# multiple data sources example (both local raster and Google Earth Engine)
extracted_multi_data <- envfetch(
x = my_data,
r = list(
"/path/to/local/raster/file1.tif",
"GEE_COLLECTION_NAME1",
"/path/to/local/raster/file2.tif"
),
bands = list(c(1, 2), c('BAND_NAME_1', 'BAND_NAME_2'), c(3, 4)),
temporal_fun = list(mean, 'last', median),
time_column_name = "time"
)
} # }