In this example, we simulate a dataset consisting of 11 communities
comprised of different combinations of two plant species (\(Sp1\) and \(Sp2\)), with each community replicated
three times under two fertiliser treatments (“low” and “high”). The
response or ecosystem function is \(y\)
and is simulated under the model:
\[\large y = \beta_1 p_1 + \beta_2 p_2 +
\delta_{12} p_1 p_2 + \alpha_k + \epsilon\]
where \(\beta_1\) and \(\beta_2\) are the identity effects of
species \(Sp1\) and \(Sp2\) respectively, with \(p_1\) and \(p_2\) being their respective proportions,
\(\delta_{12}\) is the interaction
effect between the two species, and \(\alpha_k\) is the effect of treatment
(which has \(k = 1, 2\) levels here).
\(\epsilon\) is the error term and is
assumed to be normally distributed with constant variance.
The response is simulated using true values of 5 and 3 for the
identity effects \(\beta_{1}\) and
\(\beta_{2}\) for species \(Sp1\) and \(Sp2\), respectively, 12 for the interaction
effect \(\delta_{12}\) and 6 for the
high treatment effect \(\alpha_2\) (and
\(\alpha_1\) = 0 for the low treatment
effect). A random normal error term \(\epsilon\) is added to the response with
mean 0 and variance 1.
# Seed for reproducibility
set.seed(123)
# Simulate communities
two_data <- data.frame("Sp1" = rep(seq(0, 1, .1), 6),
"Sp2" = rep(seq(1,0, by = -.1), 6))
two_data$Sp1Sp2 <- two_data$Sp1 * two_data$Sp2
# Add Treatments (assume low fertiliser treatment is 0 and high is 1)
two_data$Treatment <- rep(c(0,1), each = 33)
# Set model coefficients
Sp1_id <- 5
Sp2_id <- 3
Sp1Sp2_int <- 12
Treat <- 6
coeffs <- matrix(c(Sp1_id, Sp2_id, Sp1Sp2_int, Treat), ncol = 1)
# Simulate response
two_data$Response <- as.matrix(two_data) %*% coeffs
# Add random normal error term
two_data$Response <- two_data$Response + rnorm(nrow(two_data), 0, 1)
# Recode treatment as high and low
two_data$Treatment <- factor(ifelse(two_data$Treatment == 0, "low", "high"), levels = c("high", "low"))
# Add unique ID for each community
two_data$Community <- rep(seq(1, 11), times = 6)
# Reorder columns
two_data <- two_data[, c("Community", "Sp1", "Sp2", "Sp1Sp2",
"Treatment", "Response")]
head(two_data, 6)
## Community Sp1 Sp2 Sp1Sp2 Treatment Response
## 1 1 0.0 1.0 0.00 low 2.439524
## 2 2 0.1 0.9 0.09 low 4.049823
## 3 3 0.2 0.8 0.16 low 6.878708
## 4 4 0.3 0.7 0.21 low 6.190508
## 5 5 0.4 0.6 0.24 low 6.809288
## 6 6 0.5 0.5 0.25 low 8.715065
The data columns are as follows:
Community: A unique identifier for each community in
the design.
Sp1: A numeric vector indicating the
initial proportion of species 1.
Sp2: A numeric
vector indicating the initial proportion of species 2.
Sp1Sp2: A numeric vector indicating the product of the
proportions of species 1 and 2.
Treatment: A factor
with levels “low” and “high” identifying the two fertilisation
treatments.
Response: A numeric vector indicating
the value of the measured ecosystem function.