
Fetch data from each row using anonymous functions
fetch.Rd
This function passes your data through your supplied extraction functions, caches progress, so that if your function crashes somewhere, you can continue where you left off, shows progress and estimated time to completion and allows you to repeat sampling across different times.
Usage
fetch(
x,
...,
use_cache = TRUE,
out_dir = file.path("./output/"),
out_filename = NA,
overwrite = TRUE,
cache_dir = file.path(out_dir, "cache/"),
cache_files = NA,
time_column_name = NULL,
.time_rep = NA,
batch_size = 20000,
funs_to_use_batch_size = c("extract_gee"),
do_initial_sort = TRUE,
use_space_in_initial_sort = TRUE
)
Arguments
- x
A tibble with a
sf
"geometry" and a column with time (alubridate
interval or date), detected automatically or specified by thetime_column_name
parameter.- ...
Arguments passed on to
extract_over_time
,extract_gee
r
A file path to a raster file or a SpatRaster object from the terra package. This is the raster data. source from which the data will be extracted.
subds
positive integer or character to select a sub-dataset to extract from. If zero or "", all sub-datasets are extracted.
temporal_fun
A function used to summarise multiple data points found within a time interval. Default is
rowMeans(x, na.rm=TRUE)
. The user can supply vectorised summarisation functions (using rowMeans or rowSums) or non-vectorised summarisation functions (e.g.,sum
,mean
,min
,max
). If supplying a custom vectorisedtemporal_fun
, setis_vectorised_temporal_fun
toTRUE
to ensure the vectorised approach is used for performance. Note, vectorised summarisation functions are not possible whenfun=NULL
and you are extracting with polygon or line geometries (i.e.temporal_fun
is used to summarise, treating each time and space value independently).spatial_extraction_fun
A function used to extract points spatially for each time slice of the raster. Default is the default implementation of
extract_over_space
(extracts themean
of geometries within rasters, removing NAs).scale
The scale to aggregate your raster to (in units of the original raster). Note this will be rounded to fit the nearest aggregation factor (number of cells in each direction). Leave as NULL (the default) if you do not want any aggregation. See aggregate.
time_buffer
Time buffer used to adjust the time interval for data extraction. The function always uses the time before and after the interval to prevent errors when summarising the earliest and latest times. Default is 0 days.
debug
If TRUE, pauses the function and displays a plot for each extracted point. This is useful for debugging unexpected extracted values. Default is FALSE.
override_terraOptions
If TRUE, overrides terra's default terraOptions with those specified in the envfetch's package. Default is TRUE.
is_vectorised_summarisation_function
Whether the summarisation is vectorised (like rowSums or rowMeans). Is only necessary to be TRUE if the row-wise vectorised summarisation function has not been automatically detected (does not use rowSums or rowMeans).
trim_raster
Whether to trim the raster to time bounds as a performance optimisation. Defaults to TRUE.
subset_raster_indices
Whether to subset raster by time indices as a performance optimisation. Defaults to TRUE.
collection_name
A character string representing the Google Earth Engine image collection from which to extract data.
bands
A vector of character strings representing the band names to extract from the image collection.
lazy
A logical indicating whether to download Google Earth Engine data lazily with future::sequential objects to evaluate the task in the future. Defaults to FALSE.
initialise_gee
A logical indicating whether to initialise Google Earth Engine within the function. Default is TRUE.
use_gcs
A logical indicating whether to use Google Cloud Storage for larger requests. Default is FALSE.
use_drive
A logical indicating whether to use Google Drive for larger requests. Default is FALSE.
max_chunk_time_day_range
An string representing the maximum number of time units to include in each time chunk when splitting the dataset for efficient memory use on Google Earth Engine's end. Default is '3 months'.
max_feature_collection_size
An integer representing the maximum number of features (rows) to include in each chunk when splitting the dataset for efficient memory use on Google Earth Engine's end. Default is 5000.
ee_reducer_fun
A Google Earth Engine reducer function representing the function used to aggregate the data extracted from each image. Default is rgee::ee$Reducer$mean().
- use_cache
Whether to cache your progress. Allows you to continue where you left off in case of an error or the process is interrupted. Also avoids recomputing extractions between R sessions.
- out_dir
A directory to output your result. Is ignored if out_filename = NA.
- out_filename
The path to output the result. Set to NA (the default) to not save the result and only return the result.
- overwrite
Overwrite output file if exists.
- cache_dir
A directory to output cached progress. Is ignored if use_cache = FALSE.
- cache_files
Paths to cached files. Specify these if the cache system hasn't automatically detected your cache. Is ignored if use_cache = FALSE.
- time_column_name
Name of the time column in the dataset. If NULL (the default), a column of type lubridate::interval is automatically selected.
- .time_rep
A
time_rep
object. Used to repeat data extraction along repeating time intervals before and after the original datetime. This can be relative to the start or the end of the input time interval (specified by therelative_to_start
argument oftime_rep
). Defaults to the start.- batch_size
The maximum number of rows or geometries to extract and summarise at a time. Each batch will be cached to continue extraction in case of interruptions. Larger batch sizes may result in overuse of rgee on the server-side and hangs. Set
batch_size
to1
,NA
or<1
for no batching. Usefuns_to_use_batch_size
to define what functions batch_size will be used with.- funs_to_use_batch_size
A vector with the names of functions you want to use batch_size for. Batch size is useful for some functions (rgee:
'extract_gee'
) but not others (local:'extract_over_time'
). Defaults toc('extract_gee')
.- do_initial_sort
Whether to initially sort the unique input data
x
by space (ifuse_space_in_initial_sort
isTRUE
) and time for efficiency during later extraction processes. Defaults to TRUE.- use_space_in_initial_sort
Whether to initially sort the unique input data
x
by space in addition to time for efficiency during later extraction processes. Defaults to FALSE.
Examples
if (FALSE) { # \dontrun{
extracted <- d %>%
fetch(
~extract_across_times(.x, r = '/path/to/netcdf.nc'),
~extract_gee(
.x,
collection_name='MODIS/061/MOD13Q1',
bands=c('NDVI', 'DetailedQA'),
time_buffer=16,
)
)
# extract and summarise data every fortnight for the last six months
# relative to the start of the time column in `d`
rep_extracted <- d %>%
fetch(
~extract_across_times(.x, r = '/path/to/netcdf.nc'),
~extract_gee(
.x,
collection_name='MODIS/061/MOD13Q1',
bands=c('NDVI', 'DetailedQA'),
time_buffer=16,
),
.time_rep=time_rep(interval=lubridate::days(14), n_start=-12),
)
} # }