DI Models

Starter Pack

Welcome to the Diversity-Interactions (DI) models starter pack. Here you will find some simple examples of how to fit DI models to data, including details of the models, code to fit the models, and how to interpret the results.

Installing and Loading the DImodels R Package

Before you get started, we recommend you install and load the DImodels R package.

As the DImodels package is available on CRAN, installing it is simple. We use the install.packages() function, specifying DImodels (with the quotes), to install the latest version of the package.

install.packages("DImodels")

Next, in order to have the functions in the package available to our current R session, we load the installed package using the library() function, specifying DImodels (this time without the quotes).

library(DImodels)

And now the package is ready for use!

You should only have to install the package once (and again if an updated version is available on CRAN) but will have to load the package each time you start a new R session. It is good coding practice to load all required libraries/packages at the start of any R file that you make.

What will I learn from looking at this example?

In this example, we describe how to implement DI models using R (version ≥ 4.1.2) for a two species system using a simulated dataset consisting of two unique species and a treatment factor with two levels. The models described could be used to answer research questions like whether species diversity has an impact on the ecosystem function and if yes, is it affected by only the species identities or also their interaction.

Note that in all models fitted to this dataset, it is assumed that \(\theta\) (the non-linear interaction parameter) is equal to 1. Find out more about this non-linear parameter in the Deep Dive section.
Summary of Data
image description image description
training-starterPack-2species-summary.knit

In this example, we simulate a dataset consisting of 11 communities comprised of different combinations of two plant species (\(Sp1\) and \(Sp2\)), with each community replicated three times under two fertiliser treatments (“low” and “high”). The response or ecosystem function is \(y\) and is simulated under the model:

\[\large y = \beta_1 p_1 + \beta_2 p_2 + \delta_{12} p_1 p_2 + \alpha_k + \epsilon\]

where \(\beta_1\) and \(\beta_2\) are the identity effects of species \(Sp1\) and \(Sp2\) respectively, with \(p_1\) and \(p_2\) being their respective proportions, \(\delta_{12}\) is the interaction effect between the two species, and \(\alpha_k\) is the effect of treatment (which has \(k = 1, 2\) levels here). \(\epsilon\) is the error term and is assumed to be normally distributed with constant variance.

The response is simulated using true values of 5 and 3 for the identity effects \(\beta_{1}\) and \(\beta_{2}\) for species \(Sp1\) and \(Sp2\), respectively, 12 for the interaction effect \(\delta_{12}\) and 6 for the high treatment effect \(\alpha_2\) (and \(\alpha_1\) = 0 for the low treatment effect). A random normal error term \(\epsilon\) is added to the response with mean 0 and variance 1.

# Seed for reproducibility
set.seed(123)

# Simulate communities
two_data <- data.frame("Sp1" = rep(seq(0, 1, .1), 6),
                       "Sp2" = rep(seq(1,0, by = -.1), 6))
two_data$Sp1Sp2 <- two_data$Sp1 * two_data$Sp2

# Add Treatments (assume low fertiliser treatment is 0 and high is 1)
two_data$Treatment <- rep(c(0,1), each = 33)

# Set model coefficients
Sp1_id <- 5
Sp2_id <- 3
Sp1Sp2_int <- 12
Treat <- 6

coeffs <- matrix(c(Sp1_id, Sp2_id, Sp1Sp2_int, Treat), ncol = 1)

# Simulate response
two_data$Response <- as.matrix(two_data) %*% coeffs

# Add random normal error term
two_data$Response <- two_data$Response + rnorm(nrow(two_data), 0, 1)

# Recode treatment as high and low
two_data$Treatment <- factor(ifelse(two_data$Treatment == 0, "low", "high"), levels = c("high", "low"))

# Add unique ID for each community
two_data$Community <- rep(seq(1, 11), times = 6)

# Reorder columns 
two_data <- two_data[, c("Community", "Sp1", "Sp2", "Sp1Sp2",
                         "Treatment", "Response")]

head(two_data, 6)
##   Community Sp1 Sp2 Sp1Sp2 Treatment Response
## 1         1 0.0 1.0   0.00       low 2.439524
## 2         2 0.1 0.9   0.09       low 4.049823
## 3         3 0.2 0.8   0.16       low 6.878708
## 4         4 0.3 0.7   0.21       low 6.190508
## 5         5 0.4 0.6   0.24       low 6.809288
## 6         6 0.5 0.5   0.25       low 8.715065

The data columns are as follows:

Community: A unique identifier for each community in the design.
Sp1: A numeric vector indicating the initial proportion of species 1.
Sp2: A numeric vector indicating the initial proportion of species 2.
Sp1Sp2: A numeric vector indicating the product of the proportions of species 1 and 2.
Treatment: A factor with levels “low” and “high” identifying the two fertilisation treatments.
Response: A numeric vector indicating the value of the measured ecosystem function.

Model Fitting using DImodels Package
image description image description
Here we fit a range of DI models to the two species dataset.

training-starterPack-2species-fitting_str.knit

In this model structure, we assume that changing species diversity does not have any effect on the ecosystem function. This implies that species identities and interactions coefficients are both 0, leaving only the experimental structures (treatments, blocking structure, etc.) as fixed effects in the model. An example of this model for this data is as follows:

\[\large y= \mu + \alpha_k + \epsilon\]

where \(y\) is a single response variable, \(\mu\) is a constant, \(\alpha_k\) is the effect of the treatment which is a factor with two levels (\(k = 1, 2\)). The error term, \(\epsilon\), is assumed normally distributed with mean 0 and constant variance.

We supply the DI() function with our response column (by name), our initial species (Sp1 and Sp2) proportions (again, either by name or column number), the treatment column (by name or column number), the type of DI model that we want to fit (“STR”), and our dataset (two_data).

STRModel <- DI(y = "Response", prop = c("Sp1", "Sp2"), treat = "Treatment", DImodel = "STR", data = two_data)
## Fitted model: Structural 'STR' DImodel
STRModel <- DI(y = "Response", prop = 2:3, treat = 5, DImodel = "STR", data = two_data)
## Fitted model: Structural 'STR' DImodel
summary(STRModel)
## 
## Call:
## glm(formula = fmla, family = family, data = data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.8143  -0.9987   0.1496   1.0318   2.9268  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     5.7883     0.2540   22.79   <2e-16 ***
## Treatmenthigh   6.0748     0.3592   16.91   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 2.128805)
## 
##     Null deviance: 745.14  on 65  degrees of freedom
## Residual deviance: 136.24  on 64  degrees of freedom
## AIC: 241.14
## 
## Number of Fisher Scoring iterations: 2
training-starterPack-2species-fitting_id.knit

In this model structure, we assume that species do not interact with one another. This implies that species interactions don’t affect the ecosystem function leaving only the species proportions as fixed effects and experimental structures. An example of this model is as follows:

\[\large y=\sum_{i=1}^{S=2} \beta_{i} p_{i} + \alpha_k + \epsilon\]

where \(y\) is a single response variable or the ecosystem function, \(S\) is the number of species, and \(p_{i}\) represents the initial proportion of the \(i^{th}\) species. \(\beta_{i}\) scaled by \(p_{i}\) is the expected contribution of the \(i^{th}\) species to the response and is referred to as the \(i^{th}\) species’ identity effect, \(\alpha_k\) is the effect of the treatment which is a factor with two levels (\(k = 1, 2\)). The error term, \(\epsilon\), is assumed normally distributed with mean 0 and constant variance.

For this dataset, the ID model can also be written as:

\[\large y=\beta_{1} p_{1} +\beta_{2} p_{2} + \alpha_k + \epsilon\]

We supply the DI() function with the same parameters as the previous (STR) model, but change the DIModel parameter to ‘ID’ now.

IDModel <- DI(y = "Response", prop = c("Sp1", "Sp2"), treat = "Treatment", DImodel = "ID", data = two_data)
## Fitted model: Species identity 'ID' DImodel
summary(IDModel)
## 
## Call:
## glm(formula = new_fmla, family = family, data = new_data)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.87594  -0.94669   0.02013   0.91477   2.92678  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## Sp1_ID          6.7266     0.3497   19.23   <2e-16 ***
## Sp2_ID          4.8499     0.3497   13.87   <2e-16 ***
## Treatmenthigh   6.0748     0.3297   18.43   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1.793625)
## 
##     Null deviance: 5886  on 66  degrees of freedom
## Residual deviance:  113  on 63  degrees of freedom
## AIC: 230.79
## 
## Number of Fisher Scoring iterations: 2
training-starterPack-2species-fitting_av.knit

The average pairwise interaction structure assumes that all species interact in the same way. However, in the case of a two-species example, as in this dataset, there are not multiple pairwise interactions to average over and this model cannot be fit for this dataset.

See the three-species and nine-species examples in the Starter Pack for examples of fitting the AV model structure.

If we try to fit the pairwise interaction model to this dataset, we call the DI() function with the DImodel parameter set to ‘AV’, but it won’t fit.

AVModel <- DI(y = "Response", prop = c("Sp1", "Sp2"), treat = "Treatment", DImodel = "AV", data = two_data)
## Error in DI(y = "Response", prop = c("Sp1", "Sp2"), treat = "Treatment", : you must have > 2 species to fit model AV (Average interactions 'AV' DImodel)
training-starterPack-2species-fitting_fg.knit