-
Notifications
You must be signed in to change notification settings - Fork 1
/
calculating-neighborhood.qmd
767 lines (558 loc) · 32 KB
/
calculating-neighborhood.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
---
output: html_document
prefer-html: true
editor_options:
chunk_output_type: console
execute:
cache: true
---
# Calculating neighborhood densities
## Overview
This tutorial follows from Section 2. "Modeling considerations when estimating CDD" of the main text:
::: callout-note
## Section 2. "Modeling considerations when estimating CDD"
For survival and growth models, defining the density metric and the size of local neighborhoods also require consideration. The density of conspecific and heterospecific neighbors surrounding a focal individual (or plot) can be measured within different distances (i.e., radii). Within a radius, neighbors should be weighted in the density summation with regard to their size and distance to the focal individuals, with the rationale that neighbors at a greater distance and of smaller size have smaller effects. Typically, weighting involves dividing the basal area or diameter by a linear function or an exponential function of distance to allow competitive effects of an individual of a given size to decline with distance [@uriarte2010trait; @canham2006neighborhood]. Biologically meaningful radii should be tested based on the size/life stage, vital rate of interest, and the likely agents of density dependence in the system. For a system of interest, we suggest comparing models with multiple distances (e.g., using maximum likelihood or information criterion) because forests may differ in how neighbors influence performance.
:::
Here, we demonstrate how to calculate the density of conspecific, heterospecific, and all neighbors surrounding a focal individual or plot for a given distance when XY coordinates are known. As a result, this section requires that the user's dataset contains the location of mapped stems. If locations are not known (as is common in seedling studies or smaller plots), continue to part 2 of this appendix.
We then demonstrate how to weight the calculation of neighborhood density by individual neighbor size (*i.e.,* basal area) and distance using an exponential decay function (one of many possible decay functions), allowing the competitive effects of density values to saturate with increasing distances [@uriarte2010trait; @canham2006neighborhood].
To assess which shape parameters of the exponential decay function are most appropriate for the data set, we fit models with multiple combinations of decay function values and compare models using log likelihood.
From a computational perspective, this approach can be relatively resource intensive both in terms of time and object size. It's possible to make this approach more efficient by subdividing the data (*e.g.,* by plot) or using more efficient data structures such as [data.table](https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html).
We also note that alternative approaches allow the estimation of the effective scale of neighborhood interactions directly from data [see @barber2022bayesian]. An excellent case study using the Stan programming language is [available here](https://mc-stan.org/users/documentation/case-studies/plantInteractions.html).
::: callout-note
The following code is heavily adapted from the [latitudinalCNDD repository](https://github.com/LisaHuelsmann/latitudinalCNDD) by [Lisa Hülsmann](https://demographicecology.com/) from the publication @hulsmann2024latitudinal.
:::
## Load libraries and data
```{r, message=FALSE}
# Load libraries
library(tidyr) # For data manipulation
library(dplyr) # For data manipulation
library(ggplot2) # For data plotting
library(spatstat.geom) # For spatial analysis
library(here) # For managing working directories
library(mgcv) # For fitting gams
library(lubridate) # For calculating census intervals
library(broom) # For processing fit models
library(purrr) # For data manipulation
library(kableExtra) # For table styling
```
## Data format explanation
For this tutorial, we will be using a small example subset of data from Barro Colorado Island (BCI), Panama ([available here](https://github.com/LisaHuelsmann/latitudinalCNDD/tree/main/data_prep/input/data_tree)) that includes 7,028 observations, of 3,771 individuals of 16 tree species across two censuses from the larger 50 ha BCI Forest Dynamics Plot data set ([more information here](https://forestgeo.si.edu/sites/neotropics/barro-colorado-island)). Each stem is mapped and its diameter at 1.3m above ground is measured (DBH), which allows us to calculate neighborhood densities across different distances, and as a function of neighbor size (*e.g.*, basal area), respectively.
The code below assumes the data is in a format where each row is an observation for an individual from a census. For this data set, the column descriptions are as follows:
- **treeID**: unique identifier for each tree
- **sp**: species code
- **gx**: spatial coordinate on x axis
- **gy**: spatial coordinate on y axis
- **dbh:** diameter at breast height (mm)
- **ba:** basal area (m\^2)
- **status:** status at census, A = alive, D = dead
- **date:** date of observation
- **census**: census number
- **mort**: mortality status at census, 1 = dead, 0 = alive
- **mort_next**: mortality status at next census, 1 = dead, 0 = alive
- **interval**: time interval between censuses in years
```{r echo=FALSE}
# This code is not shown in report
# Load in and clean example BCI data
# Loads in a list object called 'tree' with separated by census
load(here("./data/site01_tree.Rdata"))
# Restructure data for mortality analysis
for (census in 1:(length(tree)-1)) {
tree[[census]]$surv = ifelse(grepl("A", tree[[census]]$status), 1, 0) # status at t=0
tree[[census]]$surv_next = ifelse(grepl("A", tree[[census+1]]$status), 1, 0) # status at t=1
tree[[census]]$mort = ifelse(grepl("A", tree[[census]]$status), 0, 1) # status at t=0
tree[[census]]$mort_next = ifelse(grepl("A", tree[[census+1]]$status), 0, 1) # status at t=1
tree[[census]]$interval = time_length(difftime(tree[[census+1]]$date
, tree[[census]]$date), "years") # length of interval
}
# Bind together into single tibble
dat <- tibble(bind_rows(tree[[1]] %>% mutate(census = 1),
tree[[2]] %>% mutate(census = 2)) %>%
dplyr::select(-c(homchange, nostems))) %>%
filter(status %in% c("A", "D"))
# Remove surv and surv next columns
dat <- dat %>%
dplyr::select(-c(surv, surv_next))
```
Let's take a quick look at the data set we'll be working with:
```{r}
head(dat, n = 5)
```
------------------------------------------------------------------------
We can produce a plot @fig-map of the tree locations, where the size of the point is scaled to basal area and colored by species, with each census displayed as a panel:
```{r, fig.width = 8, fig.height=8}
#| label: fig-map
#| fig-cap: "Map of tree locations by census. Points are individual stems scaled by basal area, and colors represent the six-letter identifiers of 16 common species at the BCI forest plot."
ggplot(dat, aes(x = gx, y = gy, size = ba, col = sp)) +
geom_point() +
facet_wrap(~census) +
theme_bw(10) +
theme(legend.position = "right") +
coord_fixed(ratio = 1) +
labs(size = "Basal area", col = "Species")
```
## Define an exponential decay function
In our neighborhood, we model neighborhood densities using an exponential decay function. In principle, it’s possible to use various different decay functions to explore various ranges that determine the rate of decay.
Here, we present the general framework for how one might explore different scenarios, though the ultimate choice of decay function and parameter values will depend on the study and is beyond the scope of this Appendix. **Note that this method of calculating neighborhood density is only possible when neighbors' x-y coordinates are known.**
First, we write a function in R for exponential decay with distance, where mu controls the shape of the curve:
```{r}
exponential_decay <- function(mu, distance){
return(exp(-(1/mu * distance)))
}
```
Let's see what the exponential decay function looks like across a range of mu values (@fig-decay):
```{r}
#| label: fig-decay
#| fig-cap: "Plot of decay function"
# Set range of mu values and distances
decay_values <- seq(from = 1, to = 25, by = 2)
# Use sprintf to add leading 0s, will help with sorting later on
decay_names = paste("exp", sprintf("%02s", decay_values), sep = "")
distances <- seq(1, 100, 1)
# Generate a dataframe with each combination of mu and distance
example_decay_values <- expand_grid(decay_values, distances) %>%
# Rename columns
rename(decay_value = decay_values,
distance = distances)
# Evaluate distance decay function for each combination of
# mu and distance
example_decay_values$decay <- exponential_decay(
mu = example_decay_values$decay_value,
distance = example_decay_values$distance)
# Plot results
ggplot(example_decay_values,
aes(x = distance, y = decay,
color = decay_value, group = decay_value)) +
ylab("Density") +
xlab("Distance (m)") +
scale_color_continuous(name = "Value of \ndecay constant") +
geom_line() +
theme_bw(12)
```
### Determine which trees are at edge of plot
Trees near the edge of plot boundaries have incomplete information about their neighborhood, because neighboring trees outside of the plot boundaries are not mapped, and we do not advise using boundary corrections to include those stems.Typically trees within certain distance threshold from the plot edge are excluded from analysis, but still included in calculations of neighborhood densities.
We are going to add a column to our data set called 'edge' that is TRUE if within a set distance to the edge of the plot (*e.g.*, 30 m in the example below).
For our example data set, the dimensions of the plot are 300 x 300 m, ranging from 0-300 on both the x and y axis, and representing a subset of the overall 50 ha forest dynamics plot at BCI.
```{r}
# Set threshold for distance to edge
distance_threshold_edge = 30
# Add in min and max values for corners of plot
min_x <- 0
max_x <- 300
min_y <- 0
max_y <- 300
dat$edge = dat$gx < min_x + distance_threshold_edge |
dat$gx > max_x - distance_threshold_edge |
dat$gy < min_y + distance_threshold_edge |
dat$gy > max_y - distance_threshold_edge
# How many trees fall within the edge threshold?
table(dat$edge)
```
Below is a plot @fig-edge of tree locations colored by whether they fall within the edge threshold or not, separated out for each census.
```{r}
#| label: fig-edge
#| fig-cap: "Map of tree locations colored by whether they fall within the edge threshold"
ggplot(dat, aes(x = gx, y = gy, col = edge)) +
geom_point() +
facet_wrap(~census) +
coord_fixed(ratio = 1) +
theme_bw(12)
```
## Calculate distances among focal individual and neighbors
Next, we will calculate distances among individuals in our plot, using an upper threshold distance of what to consider a neighbor.
We use the ['spatstat.geom' package](https://spatstat.org/) to efficiently calculate distances among individuals.
Because this example only includes one census interval (two censuses total), we will subset down to just the first census to calculate neighborhood density.
If the data set were to contain multiple census intervals, it would be necessary to calculate neighborhood density separately for each individual in each census interval, using only the individuals that were alive at the beginning of that census interval, or using an average density value between two consecutive censuses.
```{r}
# Set distance threshold for considering neighbors
distance_threshold_neighborhood <- 30
# Subset to first census - this will be different for different
# datasets
dat_first_census <- dat %>%
filter(census == 1)
# Format into 'ppp' object
dat_ppp = spatstat.geom::ppp(dat_first_census$gx, dat_first_census$gy,
window = owin(range(dat$gx),
range(dat$gy)),
checkdup = F)
# Determine close pairs based on distance threshold
# Returns a list that we convert to tibble later
neighbors = spatstat.geom::closepairs(dat_ppp,
rmax = distance_threshold_neighborhood, # Max radius
what = "ijd", # return indicies i, j, and distance
twice = TRUE)
# Convert to dataframe
neighbors <- as_tibble(neighbors)
# Take a peek at the data
# i = index of focal individual
# j = index of neighbor
# d = distance to neighbor
head(neighbors)
```
Next we add in additional columns for individual characteristic, (*e.g.,* species, size) and whether they are located near the edge of the plot (blue area in @fig-edge).
```{r}
# add additional columns
# Add species for individual i
neighbors$sp_i = dat_first_census$sp[neighbors$i]
# Add whether individual i is near edge
neighbors$edge_i = dat_first_census$edge[neighbors$i]
# Add species for individual j
neighbors$sp_j = dat_first_census$sp[neighbors$j]
# Add basal area of individual j
neighbors$ba_j = dat_first_census$ba[neighbors$j]
```
We then want to add a column that indicates whether the comparison between the focal individual and the neighbor is conspecific or heterospecific because we are interested separately estimating the densities of conspecifics and heterospecifics.
```{r}
neighbors$comparison_type <- ifelse(neighbors$sp_i == neighbors$sp_j,
yes = "con", # conspecific
no = "het") # heterospecific
```
We then remove focal trees that are too close to the edge of the plot.
```{r}
# remove focal trees i that are at the edge
neighbors = neighbors[!neighbors$edge_i, ]
```
Next, we add columns to our neighbors data set that indicates the distance decay multiplier and the distance decay multiplier weighted by basal area.
```{r}
# Loop through distance decay values
for(x in 1:length(decay_values)){
# Add column for distance decay multiplier for each decay value
# add _ba suffix to column name - will eventually be summed based on
# number of individual neighbors
neighbors[, paste0(decay_names[x], "_N")] <- exponential_decay(mu = decay_values[x],
distance = neighbors$d)
# Weight basal area by distance decay multiplier
# add _ba suffix to column name
neighbors[, paste0(decay_names[x], "_BA")] <- exponential_decay(
mu = decay_values[x],
distance = neighbors$d) * neighbors$ba_j
}
```
Depending on how many distance decay values are being investigated, there may be many columns in the data frame.
```{r}
head(neighbors)
```
## Calculate neighborhood density
Then we summarize the neighborhood density for each focal tree separately for conspecifics and heterospecifics.
```{r, message=FALSE}
# Simple calculations of number of neighbors and total basal area,
# ignoring distance decay multiplier
neighbors_summary <- neighbors %>%
group_by(i, comparison_type) %>%
summarise(nodecay_N = n(), # count of neighbors
nodecay_BA = sum(ba_j)) # sum of basal area)
# Add in decay columns
neighbors_summary_decay <- neighbors %>%
group_by(i, comparison_type) %>%
# Select only columns related to distance decay
select(starts_with("exp")) %>%
# Summarize them all by summing columns
summarise_all(sum)
# Join both together
neighbors_summary <- left_join(neighbors_summary,
neighbors_summary_decay,
by = c("i", "comparison_type"))
# Add treeID column
neighbors_summary$treeID<-dat_first_census$treeID[neighbors_summary$i]
# If there are any focal individuals with no neighbors, add values
# of 0 for neighborhood densities
noNeighbors = dat_first_census$treeID[!dat_first_census$treeID
%in% neighbors_summary$treeID &
!dat_first_census$edge]
# If there are individuals with no neighbors
if (length(noNeighbors) > 0) {
neighbors_summary = bind_rows(neighbors_summary,
expand_grid(i = NA,
treeID = noNeighbors,
comparison_type = c("het", "cons"))) %>%
# Add 0s where NA
mutate_all(replace_na, replace = 0)
}
# Take a peak at the data
head(neighbors_summary)
```
As described in the main text, it can be advantageous to use total density that includes combined conspecific and heterospecific densities as a covariate, rather than heterospecific density.
Here, we calculate total density by summing heterospecific and conspecific densities.
```{r, message=FALSE}
# First convert to long format which will make it easy to sum across
# heterospecific and conspecific values
neighbors_summary_long_format <- neighbors_summary %>%
pivot_longer(cols = -c("i", "comparison_type", "treeID"))
# Sum across heterospecific and conspecific values and rename to 'total'
neighbors_total_long_format <- neighbors_summary_long_format %>%
group_by(i, treeID, name) %>%
summarize(value = sum(value)) %>%
mutate(comparison_type = "total")
# Bind together conspecific and 'total' densities
# remove heterospecific columns
# fill in 0s where there are no neighbors
neighbors_summary_total = bind_rows(neighbors_summary_long_format,
neighbors_total_long_format) %>%
# Can filter out heterospecific neighborhood
# values by uncommenting this line of the
# objects become too large
# filter(comparison_type != "het") %>%
mutate(name = paste0(comparison_type, "_", name)) %>%
select(-comparison_type) %>%
pivot_wider(names_from = name, values_from = value,
values_fill = 0)
```
## Model mortality as a function of neighborhood density
To determine the 'best' decay parameter to use, we fit species-specific mortality models using Generalized Additive Models [GAMs](https://noamross.github.io/gams-in-r-course/) and compare models based on log likelihoods. GAMs are flexible statistical models that combine linear and non-linear components to capture complex relationships in data.
We first create our data set that we will use in the GAMs, subsetting down to just one census interval (because our example dataset only has 2 censuses) and removing trees close to the edge as above. In other datasets, you may have multiple census intervals, where it would be common practice to include 'year' or 'census' as a random effect in the model, but otherwise the overall approach is similar.
```{r, message=FALSE}
# Join census data with neighborhood data
dat_gam <- left_join(dat_first_census,
neighbors_summary_total,
by = "treeID")
# Remove edge trees (these will have NAs for densities calcualations)
dat_gam <- dat_gam %>%
filter(edge == FALSE)
```
For each species, we summarize data availability to help determine parameters for the GAM smooth terms.
```{r}
# Summarize data availability at species level to set the degree of
# smoothness for GAM smooth terms
sp_summary <- dat_gam %>%
group_by(sp) %>%
summarise(ndead = sum(mort_next),
range_con_BA = max(con_nodecay_BA) - min(con_nodecay_BA),
max_con_BA = max(con_nodecay_BA),
unique_con_BA = length(unique(con_nodecay_BA)),
unique_total_BA = length(unique(total_nodecay_BA)),
range_con_N = max(con_nodecay_N) - min(con_nodecay_N),
max_con_N = max(con_nodecay_N),
unique_con_N = length(unique(con_nodecay_N)),
unique_total_N = length(unique(total_nodecay_N)),
unique_dbh = length(unique(dbh))
)
# Filter out species that have 0% mortality
sp_summary <- sp_summary %>%
filter(ndead > 0)
```
In this long block of code, we loop over all possible combinations of decay values for different metrics of neighborhood densities weighted by distance for each species and fit a separate GAM for each model. For each GAM, we assess whether the model was able to be fit, and when it was able to be fit, whether it converged or produced warnings. We save the results of successful model fits into a list that we will process later.
For large datasets where individual GAMs take a long time to run, the code could be modified to run in parallel, either locally on a personal computer or across a computing cluster.
```{r, message=FALSE, warning=FALSE}
# Initialize list that will save model outputs
res_mod <- list()
# Model run settings
run_settings <- expand_grid(species = unique(sp_summary$sp),
decay_con = c("nodecay", decay_names),
decay_total = c("nodecay", decay_names),
nhood_data_type = c("N", "BA"))
# Loop through model run settings
for(run_settings_row in 1:nrow(run_settings)){
# Extract values from run settings dataframe
species <- run_settings$species[run_settings_row]
decay_con <- run_settings$decay_con[run_settings_row]
decay_total <- run_settings$decay_total[run_settings_row]
nhood_data_type <- run_settings$nhood_data_type[run_settings_row]
# Subset down to just focal species
dat_subset <- dat_gam %>%
filter(sp == species)
# Set run name
run_name <- paste0(species, "_total", decay_total,"_con",
decay_con, "_", nhood_data_type)
# Print status if desired
# cat("Working on run: ", run_name, " ...\n")
# Create model formula
# Initial DBH included as predictor variable
form = paste0("mort_next ~ s(dbh, k = k1) + s(total_", decay_total,
"_", nhood_data_type,
", k = k2) + s(con_", decay_con,
"_", nhood_data_type,
", k = k3)")
# Convert into formula
form <- as.formula(form)
# Choose penalties for model fitting
# set to default 10 (the same as -1)
# The higher the value of k, the more flexible the smooth term becomes, allowing for more intricate and wiggly patterns. Conversely, lower values of k result in smoother and simpler representations.
k1 = k2 = k3 = 10
if (k1 > sp_summary$unique_dbh[sp_summary$sp == species]) {
k1 = sp_summary$unique_dbh[sp_summary$sp == species] - 2
}
if (k2 > sp_summary$unique_total_N[sp_summary$sp == species]) {
k2 = sp_summary$unique_total_N [sp_summary$sp == species]- 2
}
if (k3 > sp_summary$unique_con_N[sp_summary$sp == species]) {
k3 = sp_summary$unique_con_N[sp_summary$sp == species] - 2
}
# Fit model
# wrap in a try function to catch any errors
mod = try(gam(form,
family = binomial(link=cloglog),
offset = log(interval),
data = dat_subset,
method = "REML"),
silent = T)
# Check if model was able to fit
if (!any(class(mod) == "gam")) {
# print(paste("gam failed for:", run_name))
} else {
# Check if gam converged
if (!mod$converged) {
# print(paste("no convergence for:", run_name))
} else {
# check for complete separation
# https://stats.stackexchange.com/questions/336424/issue-with-complete-separation-in-logistic-regression-in-r
# Explore warning "glm.fit: fitted probabilities numerically 0
# or 1 occurred"
eps <- 10 * .Machine$double.eps
glm0.resids <- augment(x = mod) %>%
mutate(p = 1 / (1 + exp(-.fitted)),
warning = p > 1-eps,
influence = order(.hat, decreasing = T))
infl_limit = round(nrow(glm0.resids)/10, 0)
# check if none of the warnings is among the 10% most
# influential observations, than it is okay..
num = any(glm0.resids$warning & glm0.resids$influence < infl_limit)
# complete separation
if (num) {
# print(paste("complete separation is likely for:", run_name))
} else {
# missing Vc
if (is.null(mod$Vc)) {
# print(paste("Vc not available for:", run_name))
} else {
# Add resulting model to list if it passes all checks
res_mod[[run_name]] <- mod
} # Vc ifelse
} # complete separation ifelse
} # convergence ifelse
} # model available ifelse
} # end run settings loop
```
## Summarize model fits
Next we will extract summaries for each model with broom::glance() that provides key information like degrees of freedom, log likelihood, AIC, etc.
```{r, results='asis'}
# Extract summaries for each model into a list
sums = lapply(res_mod, broom::glance)
# Add a column for model run to each object in the list
sums = Map(cbind, sums, run_name = names(sums))
sums = do.call(rbind, sums) # Bind elements of list together by rows
rownames(sums) <- NULL # Remove row names
# Separate run name into columns for species, decay, and density type
sums <- sums %>%
separate(run_name, into = c("sp", "decay_total",
"decay_con", "density_type"),
remove = FALSE)
# Remove 'total' and 'con' from decay columns
sums$decay_total <- gsub("total", "", sums$decay_total)
sums$decay_con <- gsub("con", "", sums$decay_con)
# Rearrange columns
sums <- sums %>%
select(run_name, sp, decay_total, decay_con, density_type,
everything())
# Take a look at the model summaries
head(sums)
```
Due to limited sample sizes, it is likely that GAMs will not fit for each species. For example, in the table below of summary data for species where models did not successfully fit/converge, there are several instances where all individuals of the species survived during the census, leading to no mortality events to use in the model.\
\
We need to exclude species without complete model runs from our overall calculations when looking for optimal decay parameters across the entire data set.
```{r}
# Tally up number of model runs by species and total decay values
table(sums$sp, sums$decay_total)
# get incomplete run-site-species combinations
run_counts_by_sp <- sums %>%
group_by(sp) %>%
tally() %>%
# Join with overall species list
left_join(sp_summary %>% select(sp), ., by = "sp")
# Get expected number of runs if all models work
expected_total_runs <- run_settings %>%
group_by(species) %>%
tally() %>%
pull(n) %>%
max()
# Save species names where they didn't have all expected combinations
# of model runs
incomplete = run_counts_by_sp$sp[run_counts_by_sp$n <
expected_total_runs |
is.na(run_counts_by_sp$n)]
# Species with successful runs
head(sp_summary[!sp_summary$sp %in% incomplete, ])
# Species without successful runs
head(sp_summary[sp_summary$sp %in% incomplete, ])
```
## Selecting optimum decay parameter values across all species
We then summarize different model criteria across all species runs. To look for the optimal value for decay parameters, we sum log likelihoods across all species for a given decay parameter combination and choose the resulting parameter combination with the highest summed log likelihood.
One crucial point is that each run of the grid search should be done with the same set of trees, even if smaller neighborhodoes not need a buffer of 30 m.
```{r, message=FALSE}
sums_total <- sums %>%
filter(!sp %in% incomplete) %>%
group_by(decay_total, decay_con, density_type) %>%
summarise(nvalues = n(),
sumlogLik = sum(logLik),
meanlogLik = mean(logLik)) %>%
arrange(decay_total, decay_con, density_type)
sums_total
```
We then create a heatmap plot @fig-optim of summed log likelihoods for all fixed devay (mu) parameter combinations across all species, with the optimal parameter combination marked with an X
```{r, fig.height = 15, fig.width=10}
#| label: fig-optim
#| fig-cap: "Heatmap of optimal values for decay constants"
# Find optimum value separately for N and BA
optimum <- sums_total %>%
group_by(density_type) %>%
slice_max(sumlogLik)
# Plot heatmap of log likelihood values
ggplot(sums_total, aes(x = decay_total, y = decay_con,
fill = sumlogLik)) +
geom_tile(width = 0.9, height = 0.9, col = "black") +
scale_fill_viridis_c() + # viridis color palette
geom_label(data = optimum, label = "X") +
labs(x = "Decay total density", y = "Decay conspecific density",
fill = "sumlogLik") +
facet_wrap(~density_type, ncol = 1) +
theme_bw(12) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5,
hjust=1))
```
For this data set, the following are the optimal decay parameter values across all species separately for neighborhood density calculated by abundance (N) and by basal area (BA)
```{r}
optimum
```
## Selecting optimum decay parameter values separately for each species
Alternatively, it may be useful to determine optimum decay parameters on a species by species basis. To do this, we find the maximum log liklihood for each species separately across decay parameter combination and choose the resulting parameter combination with the highest log likelihood.
```{r, message=FALSE}
sums_total_by_spp <- sums %>%
filter(!sp %in% incomplete) %>%
group_by(decay_total, decay_con, density_type, sp) %>%
summarise(nvalues = n(),
sumlogLik = sum(logLik),
meanlogLik = mean(logLik)) %>%
arrange(decay_total, decay_con, density_type)
sums_total_by_spp
```
We create a heatmap plot @fig-optim-spp of log likelihoods for all fixed decay parameter combinations for each species separately, with the optimal parameter combination marked with an X. We only display the first three species here, because with data sets containing many species, it's hard to visualize all the species on one graph and you may wish to subdivide the plot further for visualization.
```{r, fig.height = 15, fig.width=10}
#| label: fig-optim-spp
#| fig-cap: "Heatmap of optimal values for decay constants separately for each species"
sums <- sums %>%
filter(sp %in% c("cappfr", "cordbi", "des2pa")) %>%
group_by(sp) %>%
# Scale loglikihood by species to help with visualization
mutate(logLik_scaled = scale(logLik)) %>%
ungroup()
# Find optimum value separately for N and BA
optimum_by_sp <- sums %>%
group_by(sp, density_type) %>%
slice_max(logLik_scaled, with_ties = FALSE)
# Plot heatmap of log likelihood values
ggplot(sums, aes(x = decay_total, y = decay_con,
fill = logLik_scaled)) +
geom_tile(width = 0.9, height = 0.9, col = "black") +
geom_label(data = optimum_by_sp, label = "X") +
labs(x = "Decay total density", y = "Decay conspecific density",
fill = "logLik") +
facet_wrap(~sp + density_type, ncol = 2, scales = "free") +
scale_fill_viridis_c() + # viridis color palette
theme_bw(12) +
theme(legend.position = "right") +
labs(fill = "Log likelihood\n(scaled)") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5,
hjust=1))
```
For this data set, the following are the optimal decay parameter values for each species separately for neighborhood density calculated by abundance (N) and by basal area (BA):
```{r}
optimum_by_sp
```