^{1}

^{1}

^{2}

^{1}

The uptake of technologies such as airborne laser scanning (ALS) and more recently digital aerial photogrammetry (DAP) enable the characterization of 3-dimensional (3D) forest structure. These forest structural attributes are widely applied in the development of modern enhanced forest inventories. As an alternative to extensive ALS or DAP based forest inventories, regional forest attribute maps can be built from relationships between ALS or DAP and wall-to-wall satellite data products. To date, a number of different approaches exist, with varying code implementations using different programming environments and tailored to specific needs. With the motivation for open, simple and modern software, we present ^{th} percentile of first returns height (

The package was implemented in R (

Up-to-date and extensive assessment of forest structure and resources are crucial to support sustainable forest management practices and wall-to-wall monitoring programs [

To address the need for wall-to-wall estimates, imagery acquired from spaceborne platforms can be used as the base upon which to extrapolate 3D forest attributes. Recent advances in pre-processing workflows such as cloud masking [

Existing research provides a number of modeling approaches to extrapolate ALS-derived 3D forest representations using wall-to-wall satellite data. Random forest regression models, for example, were used by Wilkes et al. [

The approaches above use a variety of forest structural metrics, a range of satellite data from a range of sensors, including time series data, and a variety of statistical methods for forest attributes extrapolation including k-NN imputation, random forest, and ordinary least squares regression. Moreover, it is crucial to ensure that the sample of data used for model development is representative of the entire range of forest structure occurring across the area where the model is applied. Structural guided sampling can be carried out in different ways and a standardized approach is necessary. With the motivation for open, simple and modern software, we present

This paper describes the general structure of

The main processing tasks implemented in

The package is organized around functions for data preprocessing, stratified random sample selection, spectral index calculation, time series summary metrics calculation, k-NN predictive model development and their accuracy assessment, and finally response variable (i.e. forest attributes) imputation.

Each colored box represent a processing step in the workflow and the names of the functions are displayed in green.

Function name | Description | |
---|---|---|

Data preprocessing | Successively project and resample a raster layer coordinate system and spatial resolution to match a reference using bilinear of nearest neighbor methods | |

Match the extent of a reference raster and eventually mask cells of the references that have a specific value | ||

Apply a function in a moving window on each band of the input raster (e.g. for smoothing) | ||

Create a buffer (i.e. assign cells to NA values) around cells having NA values in the input raster (e.g. can be used to remove cells located on boundaries) | ||

Multispectral data and temporal summary | Calculate a series of spectral indices from multispectral data | |

Calculate a set of pre-defined or user defined temporal summary metrics of an annual time series of variables | ||

Sampling | Perform a stratified random sampling of the response variables. More specifically, response variables are clustered using k-means algorithm and sample points are selected within the clusters. | |

Get the value of a raster cells at sample points | ||

Split a sample into training and testing sets | ||

Modeling | Train a k-NN model from response and predictor variables of the training set | |

Assess the accuracy of the trained k-NN model using held-out testing set | ||

Return the variable importance of each predictor when random forest based k-NN imputation is used | ||

Impute the response variables at targets |

Cell-level forest attributes, satellite imagery, and other datasets used to calculate predictors, can come from different sources and therefore have different extents and resolutions. In order to integrate these datasets together, preprocessing steps are required. Common tasks that users can encounter are resampling, cropping or projecting different raster layers in order to match a given spatial resolution and extent. In addition, masking and focal operations (also known as spatial convolution filters) can be applied to constrain modeling to specific cells or to smooth data prior to model development. The

Given a reference raster layer, the function

Strunk et al. [

Based on the same principle, the function

The functions for data preprocessing implemented in

In two-stage modelling approaches where the set of reference observations is available as a gridded product (like gridded ALS metrics or forest attribute variables), it is necessary to reduce the number of reference observations for computational efficiency [

In

The sample points are returned in a

Many predictors used in forest structure imputation studies are derived from multispectral data. Tasseled Cap indices, Normalized Difference Vegetation Index (NDVI), Normalized Burn Ratio (NBR) are some of the spectral indices commonly used [

Spectral indices of vegetation usually follow seasonal patterns that repeat on an annual cycle. The trajectory of a spectral index (or any other annual time series-based predictor) over time can thus be examined from annual time series [

Predictor and response variables extracted at the sample locations are used to train a k-NN model, the modeling approach currently implemented in

The function

^{2}), root mean square error (RMSE) and bias, defined as follows:
_{i}

If using random forest proximity matrix as a measure of nearness, each predictor importance in the imputation model can be returned and plotted using

The following will illustrate how

^{th} percentile of first return heights (^{2} [

Blocks in red represent the ALS coverage and the light grey box is the area where ALS metrics need to be extrapolated. Predictor variables are available across the entire study area, at both references and targets.

Variable | Variable name | Description | Min | Mean | Max | Standard deviation |
---|---|---|---|---|---|---|

95^{th} percentile of canopy height | 95^{th} percentile of returns height | 0.3 m | 19.9 m | 43.9 m | 8.5 m | |

Canopy cover above mean height | Proportion of first returns above mean returns height | 5.0% | 61.2% | 91.1% | 14.9% |

The imputation of ALS metrics was carried out following the steps of the diagram shown in

# Open all Landsat images in a list (path_BAP contains the path to each Landsat BAP image in chronological order)

BAP_ts <- lapply(path_BAP, raster::stack)

# Calculate spectral indices

calcIndices(BAP_ts, indices = c("NDVI", "TCB", "TCG", "TCW"), = 3, nir = 4, sat = "Landsat5TM")

As suggested by [

# Stratified random sampling

sample_strata <- getSample(ALS_metrics, layers = c("p95","cover"),n = 230, strata = 5, mindist = 75)

The median, interquartile range (IQR) and Theil-Sen slope of each spectral index time series was calculated at both sample and target locations using

# Example of NDVI time series (NDVI_ts) temporal summary metrics calculation

temporalMetrics(NDVI_ts, metrics = "defaultTemporalSummary")

A total of 14 predictor variables were gathered: the median, IQR and Theil-Sen slope of the 4 spectral indices, the DEM and the terrain slope (summarized in

# Partition sample_strata into 5 folds (sample_strata$cluster contains the strata of each sampled point)

train_folds <- partition(sample_strata$cluster, type = "kfold", kfold = 5)

Variable | Variable name | Units | Description |
---|---|---|---|

Temporal metrics of TCB, TCW and TCG | - | Median, Theil-Sen slope and IQR of 25 year time series of Tasseled Cap indices and NDVI. | |

Elevation | m | Terrain elevation above sea level | |

Slope | ° | Topographic slope in degrees |

Using

# Train a kNN model. X_vars_sample and Y_vars_sample are the predictor variables (X) and response variables (Y) extracted at sampled points

kNN_model <- trainNN(x = X_vars_sample, y = Y_vars_sample, k = 1, inTrain = train_folds, ntree = 200)

Finally, the function

# Impute response variables. X_vars is a RasterStack containing all predictor variables

Y_imputed <- predictTrgs(model = kNN_model$model, x = X_vars)

The computing time for raster processing functions vary depending on the size of the inputs (e.g. size of the study area), the number of outputs to process (i.e. number of indices, length of time series) and the parameters used for calculation (e.g. type of temporal summary metrics, number of threads in parallel processing). Functions that are likely to take the longer time to run are ^{2}, 900 km^{2}, 3600 km^{2}, 8100 km^{2} and 14400 km^{2} respectively) and calculated TCG, TCW, TCB, NDVI at 25 different years. In addition, temporal summary metrics spanning the 25 years were generated for each of the four spectral metrics. The k-NN model trained for the example of the Alex Fraser Research Forest was also used to impute response variables on each dataset. The built in capacity of the

Computing times measured for the function

Dashed lines for the 4000 x 4000 pixels in

Averaged over the 5 cross-validation folds, the model predicts ^{2} value of 0.72 and 0.55, RMSE% of 18.5% and 11.4% and a relative bias of -0.6% and 1.4% respectively. Scatterplots between predicted and observed ALS metrics for the 5 folds are displayed in ^{2} value of 0.60 and 0.47, RMSE% of 26.1% and 12.6% and a relative bias of 3.2% and 1.5% were obtained when comparing predicted and observed

The mapped differences between predicted and observed

As shown in

The framework implemented in

While the purpose of the example presented in this paper is to illustrate how to use

The extrapolation of forest attributes with Landsat derived predictors and k-NN imputation is one of the many possible approaches. For example, Chi et al. [

In

Computing time measurements reported in this paper show that imputation can be performed in a reasonable amount of time. Memory issues are well handled by the raster package on which the

The framework currently implemented in

Despite the increasing collection rate of ALS and DAP for forest inventory purposes, covering large areas remains expensive. In order to take advantage of this highly detailed data, imputation can be used to estimate 3D forest attributes on a large scale by making use of wall-to-wall satellite imagery acquired regularly over the past decades.

To illustrate how to use

(TIF)

Click here for additional data file.

We thank Alex Fraser Research Forest (AFRF) for allowing lidar data to be used in this paper and providing the test example in the