Version history

2023-08-15: revised version based on initial peer review
2023-03-15: version for archiving on Ag Data Commons
2023-02-28: edit figures to look like the other figures in the manuscript
2023-02-07: first complete draft

Summary

This notebook contains R code necessary to reproduce the trait covariate analysis in the manuscript:

Penn, H. J. and Q. D. Read. 2023. Stem borer damage dependent on interactions of plant variety, age, and prior conspecific herbivory. Functional Ecology, in review. update this when MS is accepted.

This is an analysis using variety-level traits as covariates in our statistical model to predict proportion of damaged internodes. In this revised version, instead of doing a formal variable selection process, we examine the pairwise correlation matrix of the variety traits and select six traits that are relatively uncorrelated with one another and that encompass both quantitative traits that breeders select for, and qualitative traits that are hypothesized to be related to defense against SCB.

We fit a model similar to our original model with previous damage, stalk position, year, and crop cycle as fixed effects, but the difference being instead of variety as a fixed effect, the six traits are all included as fixed effects. All continuous traits were z-transformed before inclusion in the model so that their effects are comparable. The parameter estimates for each of the traits are presented here.

This analysis follows the primary analysis in the notebook 01_boring_analysis.Rmd.

Setup

library(tidyverse)

library(brms)

library(emmeans)

library(readxl)

library(gt)

library(tidybayes)

library(polycor)

library(reshape2)

library(brmsmargins)

library(glue)



options(mc.cores = 4, brms.backend = 'cmdstanr', brms.file_refit = 'never')

Read the variety lookup table and trait dataset. Convert categorical traits to factors with the appropriate level ordering. “Unknown” will come last in the ordering in all cases.

lookup <- read_csv('project/data/variety_lookup.csv') %>%

  mutate(check = grepl('\\*', `SCB resistance`),

         resistance = gsub('\\*', '', `SCB resistance`))



lookup <- mutate(lookup, Variety_nospace = gsub(' ', '', gsub('-', '_', Variety)))



traits <- read_csv('project/data/Select_variety_traits_12Dec2022.csv') %>%

  rename_with(\(x) gsub(' ', '_', x))



traits <- mutate(traits, Variety_nospace = gsub(' ', '', gsub('-', '_', Variety)),

                 SCB_ranking = factor(SCB_ranking),

                 SCB_resistance_rating = factor(SCB_resistance_rating, levels = c('Susceptible', 'Moderately Susceptible', 'Moderately Resistant', 'Resistant')),

                 Maturity = factor(Maturity, levels = c('Early', 'Early-Mid', 'Middle')),

                 Stalk_Wax = factor(Stalk_Wax, levels = c('Low', 'Moderate', 'High', 'Unknown')),

                 Leaf_Sheath_Wax = factor(Leaf_Sheath_Wax, levels = c('Low', 'Moderate', 'High', 'Unknown')),

                 Leaf_Sheath_Hair = factor(Leaf_Sheath_Hair, levels = c('None', 'Low', 'High', 'Unknown')),

                 Collar_Hair = factor(Collar_Hair, levels = c('Low', 'Moderate', 'High', 'Unknown')),

                 Leaf_Sheath_Tightness = factor(Leaf_Sheath_Tightness, levels = c('Loose', 'Average', 'Tight', 'Unknown')),

                 Leaf_Sheath_Edge = factor(Leaf_Sheath_Edge, levels = c('Non-necrotic', 'Necrotic with age', 'Unknown'))

                 )



traits <- rename(traits, Variety_formatted = Variety, Variety = Variety_nospace)

Read the main dataset. Do the same cleaning steps as were done in the primary analysis document.

dat <- read_csv('project/data/BoredInternodes_26April2022_no format.csv', col_types = paste0('icccccffi', strrep('i', 30), 'cc'))

lookup <- read_csv('project/data/variety_lookup.csv') %>%

  mutate(check = grepl('\\*', `SCB resistance`),

         resistance = gsub('\\*', '', `SCB resistance`))



dat_long <- dat %>%

  select(-c(`# Internodes`, Notes)) %>%

  pivot_longer(cols = `1`:`30`, names_to = 'internode', values_to = 'damage') %>%

  filter(!is.na(damage)) %>%

  mutate(internode = as.integer(internode))



dat_long <- dat_long %>% mutate(

  damage = if_else(damage == 22, 2L, damage),

  Stubble = if_else(Stubble == 'Plant cane', 'Plant Cane', Stubble),

  Block = factor(substr(as.character(Plot), 1, 1)),

  Variety = gsub(' ', '', gsub('-', '_', Variety))

)



dat_long_withcov <- dat_long %>%

  arrange(Year, Date, Location, Experiment, Stubble, Variety, Block, Plot, Stalk, Treatment, internode) %>%

  group_by(Year, Date, Location, Experiment, Stubble, Variety, Block, Plot, Stalk, Treatment) %>%

  mutate(previous_damage = lag(cumsum(damage > 0), default = 0),

         previous_damage_proportion = c(0, (previous_damage / 0:(length(damage)-1))[-1])) %>%

  ungroup



damage_over_time_by_stalk <- dat_long %>%

  filter(grepl('YRS', Experiment)) %>%

  group_by(Year, Variety, Plot, Stalk) %>%

  summarize(prop_damaged = sum(damage > 0)/length(damage))



damage_over_time_by_plot <- damage_over_time_by_stalk %>%

  group_by(Year, Variety, Plot) %>%

  summarize(prop_damaged = mean(prop_damaged))



varieties_5yrs <- damage_over_time_by_plot %>%

  group_by(Variety) %>%

  filter(length(unique(Year)) >= 5) %>%

  pull(Variety) %>%

  unique



dat_fit <- dat_long_withcov %>% 

  filter(grepl('YRS', Experiment), Variety %in% varieties_5yrs) %>% 

  dplyr::select(-c(Location, Experiment, Treatment)) %>%

  mutate(damage_binary = as.integer(damage > 0),

         Year = Year - min(Year))

Join the damage data with the trait data.

damage_traits <- left_join(dat_fit, traits)

Create vector of trait names.

trait_names <- c(

"Year_1_Sugar", "Year_1_Biomass", "Year_1_TRS", "Year_1_Stalk_Weight", 

"Year_1_Stalk_Population", "Year_1_Fiber", "Year_2_Sugar", "Year_2_Biomass", 

"Year_2_TRS", "Year_2_Stalk_Weight", "Year_2_Stalk_Population", 

"Year_2_Fiber", "Maturity", "Stalk_Wax", "Leaf_Sheath_Wax", "Leaf_Sheath_Hair", 

"Leaf_Sheath_Tightness", "Leaf_Sheath_Edge", "Collar_Hair")

Feature selection

Examine correlation matrix of the different traits. It is a heterogeneous correlation matrix because we have different types of variables, categorical and continuous. If a pair of variables are both numeric, we can use Pearson correlation. If one is ordinal, we use polyserial correlation. If two are ordinal, we use polychoric correlation.

trait_mat <- traits %>% select(all_of(trait_names)) %>% as.data.frame



cormat <- hetcor(trait_mat, use = 'pairwise.complete.obs')

Find the highest pairwise correlations in the correlation matrix so that we can remove some of the trait variables from analysis that are not adding any additional information.

cormat_tri <- cormat$correlations

cormat_tri[lower.tri(cormat_tri, diag = TRUE)] <- NA

corr_long <- melt(cormat_tri) %>% filter(!is.na(value)) %>% arrange(-abs(value))

Here are the ten highest pairwise correlations among the traits. Some of the continuous traits are highly correlated (r > 0.8) with each other. We can remove year 2 stalk weight, year 2 TRS, year 2 fiber, and year 2 sugar from the model because they are very correlated with the year 1 versions of those values. We can also remove year 1 biomass and year 2 biomass from the model because they are very correlated with year 1 sugar and year 2 sugar, respectively. In addition, it looks like stalk wax and leaf sheath wax are highly correlated because varieties waxy in one part of the plant are going to be waxy overall. The same applies to leaf sheath hair and collar hair. Out of those traits, stalk wax and collar hair seem to be more evenly distributed than leaf sheath wax and leaf sheath hair. Thus, we remove leaf sheath wax and hair.

Going further, collar hair is also reasonably highly correlated with both stalk wax and maturity so we can remove it as well. Year 1 biomass and year 1 sugar are both highly correlated with year 1 TRS so we remove them. The remaining stalk weight and stalk population traits are correlated with fiber and TRS so we remove them as well as they are expressing similar information.

gt(head(corr_long, 10)) %>% fmt_number(value, decimals = 3)

Var1	Var2	value
Year_1_TRS	Year_2_TRS	0.971
Year_1_Stalk_Weight	Year_2_Stalk_Weight	0.969
Year_1_Fiber	Year_2_Fiber	0.928
Year_1_Sugar	Year_1_Biomass	0.893
Year_2_Sugar	Year_2_Biomass	0.851
Year_1_Biomass	Year_2_Sugar	0.814
Stalk_Wax	Leaf_Sheath_Wax	0.768
Leaf_Sheath_Hair	Collar_Hair	0.762
Year_1_Sugar	Year_2_Sugar	0.753
Stalk_Wax	Leaf_Sheath_Hair	0.715

trait_names_final <- c('Year_1_TRS', 'Year_1_Fiber', 'Maturity', 'Stalk_Wax', 'Leaf_Sheath_Edge', 'Leaf_Sheath_Tightness')

The result is six traits: year 1 TRS, year 1 fiber, maturity, stalk wax, leaf sheath edge, and leaf sheath tightness.

Scaling trait values

The continuous trait values are all in different units. It is better for model fitting to scale (z-transform) them.

continuous_traits <- c('Year_1_TRS', 'Year_1_Fiber')



damage_traits_scaled <- damage_traits %>%

  mutate(across(all_of(continuous_traits), ~ as.vector(scale(.))))

Model fitting

Fit a model with the six selected traits, previous damage, stalk position, year, and crop cycle as fixed effects, without variety identity as fixed effect. The random effects are the same as in the original variety model. This chunk of code was run on SciNet.

cov_model_formula <- formula(paste('damage_binary ~ internode + previous_damage + Year + Stubble', paste(trait_names_final, collapse = ' + '), '(1|Year) + (1|Plot:Year) + (1|Stalk:Plot:Year)', sep = ' + '))



binom_fit_bestcovs <- brm(

  cov_model_formula,

  data = damage_traits_scaled, family = bernoulli(link = 'logit'),

  prior = c(

    prior(normal(0, 1), class = 'b')

  ),

  chains = 4, iter = 4000, warmup = 3000, seed = 815,

  file = 'project/binom_fit_reduced'

)

Diagnostics

Look at the convergence diagnostics and the posterior predictive check plots. Everything looks good.

max(rhat(binom_fit_bestcovs))

## [1] 1.012727

pp_check(binom_fit_bestcovs, type = 'bars')

Posterior predictions by trait

Estimated marginal means and marginal trends

Now we will use the model to generate predictions for each trait. Because all seven traits are in the model, the predictions we see are marginal means, setting the values of all other traits in the model to an average value (as well as averaging across all the random effects). However the model does not include any interactions between traits or interactions between any traits and previous damage/year/crop cycle.

For traits where one of the levels was “unknown,” we removed that value from the plot. The two continuous traits presented (Year 1 TRS and Year 1 Fiber) are shown back-transformed to the original trait value although a z-transformed value was used in model fitting.

emm_wax <- emmeans(binom_fit_bestcovs, ~ Stalk_Wax, type = 'response', at = list(Stalk_Wax = setdiff(levels(traits$Stalk_Wax),'Unknown')))

emm_sheathedge <- emmeans(binom_fit_bestcovs, ~ Leaf_Sheath_Edge, type = 'response', at = list(Leaf_Sheath_Edge = setdiff(levels(traits$Leaf_Sheath_Edge),'Unknown')))

emm_maturity <- emmeans(binom_fit_bestcovs, ~ Maturity, type = 'response')

emm_tightness <- emmeans(binom_fit_bestcovs, ~ Leaf_Sheath_Tightness, type = 'response', at = list(Leaf_Sheath_Tightness = setdiff(levels(traits$Leaf_Sheath_Tightness),'Unknown')))



n_x <- 10



trs_range <- range(damage_traits_scaled$Year_1_TRS)

trs_backtr_range <- range(damage_traits$Year_1_TRS)

trs_seq <- seq(trs_range[1], trs_range[2], length.out = n_x)

trs_backtr_seq <- seq(trs_backtr_range[1], trs_backtr_range[2], length.out = n_x)



fib_range <- range(damage_traits_scaled$Year_1_Fiber)

fib_backtr_range <- range(damage_traits$Year_1_Fiber)

fib_seq <- seq(fib_range[1], fib_range[2], length.out = n_x)

fib_backtr_seq <- seq(fib_backtr_range[1], fib_backtr_range[2], length.out = n_x)

marginal_draws_trs <- brmsmargins(binom_fit_bestcovs, at = data.frame(Year_1_TRS = trs_seq))

marginal_summ_trs <- Rutilitybelt::pred_quantile(x_pred = data.frame(Year_1_TRS = trs_backtr_seq), y_pred = marginal_draws_trs$Posterior)

saveRDS(marginal_summ_trs, 'marginal_trend_trs.rds')



marginal_draws_fib <- brmsmargins(binom_fit_bestcovs, at = data.frame(Year_1_Fiber = fib_seq))

marginal_summ_fib <- Rutilitybelt::pred_quantile(x_pred = data.frame(Year_1_Fiber = fib_backtr_seq), y_pred = marginal_draws_fib$Posterior)

saveRDS(marginal_summ_fib, 'marginal_trend_fib.rds')

Contrasts between levels of categorical traits

con_wax <- contrast(emm_wax, 'pairwise')

con_sheathedge <- contrast(emm_sheathedge, 'pairwise')

con_maturity <- contrast(emm_maturity, 'pairwise')

con_tightness <- contrast(emm_tightness, 'pairwise')

Plots of estimated marginal means and trends

emm_plot <- function(emm, varlabel, legend_pos = c(0.8, 0.2)) {

  emm_draws <- gather_emmeans_draws(emm) %>% mutate(.value = plogis(.value)) # Manual back-transformation from logit scale

  varname <- names(emm_draws)[[1]]

  

  ggplot(emm_draws, aes_string(x = '.value', y = varname)) +

    stat_pointinterval(.width = c(0.66, 0.95), point_size = 1, interval_size_range = c(0.8, 2.5), color = 'gray50') +

    stat_summary(fun = 'median', geom = 'point', size = 1.5, color = 'black') +

    scale_y_discrete(name = varlabel) +

    scale_x_continuous(name = 'Estimated proportion of bored internodes') +

    scale_color_brewer(palette = 'Greens', name = 'credible interval level') +

    theme(legend.position = legend_pos)

}



(pemm_wax <- emm_plot(emm_wax, 'Stalk wax'))

(pemm_sheath <- emm_plot(emm_sheathedge, 'Leaf sheath edge type', legend_pos = c(0.8, 0.8)))

(pemm_maturity <- emm_plot(emm_maturity, 'Maturity'))

(pemm_tight <- emm_plot(emm_tightness, 'Leaf sheath tightness', legend_pos = c(0.8, 0.5)))

(ptrend_trs <- ggplot(marginal_summ_trs, aes(x = Year_1_TRS, y = q0.5)) + 

  geom_ribbon(aes(ymin = q0.025, ymax = q0.975), alpha = 0.3, color = NA) + 

  geom_ribbon(aes(ymin = q0.17, ymax = q0.83), alpha = 0.3, color = NA) + 

  geom_line(linewidth = 0.8) + 

  scale_x_continuous(name = 'Year 1 TRS', expand = expansion(mult = c(0, 0))) +

  labs(y = 'Estimated proportion of bored internodes'))

(ptrend_fib <- ggplot(marginal_summ_fib, aes(x = Year_1_Fiber, y = q0.5)) + 

  geom_ribbon(aes(ymin = q0.025, ymax = q0.975), alpha = 0.3, color = NA) + 

  geom_ribbon(aes(ymin = q0.17, ymax = q0.83), alpha = 0.3, color = NA) + 

  geom_line(linewidth = 0.8) + 

  scale_x_continuous(name = 'Year 1 Fiber', expand = expansion(mult = c(0, 0))) +

  labs(y = 'Estimated proportion of bored internodes'))

# ggsave('5aedited.png', ptrend_fib, height = 4, width = 4, dpi = 400)

# ggsave('5bedited.png', ptrend_trs, height = 4, width = 4, dpi = 400)

# ggsave('5cedited.png', pemm_maturity, height = 4, width = 4, dpi = 400)

# ggsave('6aedited.png', pemm_wax, height = 4, width = 4, dpi = 400)

# ggsave('6bedited.png', pemm_sheath, height = 4, width = 4, dpi = 400)

# ggsave('6cedited.png', pemm_hair, height = 4, width = 4, dpi = 400)

# ggsave('6dedited.png', pemm_tight, height = 4, width = 4, dpi = 400)

Tables of estimated marginal means and trends

Note: the continuous trait trends are in standardized units.

emm_table <- function(emm, table_caption) {

  emm_quant <- gather_emmeans_draws(emm) %>%

    mutate(.value = plogis(.value)) %>%

    median_qi(.width = c(.66, .95))

  

  names(emm_quant)[1] <- 'level'

  

  emm_quant %>%

    mutate(QCI = glue('({round(.lower, 3)}, {round(.upper, 3)})'),

           .value = round(.value, 3)) %>%

    pivot_wider(id_cols = c(level, .value), names_from = .width, values_from = QCI, names_prefix = 'QCI') %>%

    gt(caption = table_caption) %>%

    cols_label(.value = 'median', QCI0.66 = '66% QCI', QCI0.95 = '95% QCI') %>%

    tab_options(column_labels.font.weight = 'bold')

    

}



emm_table(emm_wax, 'stalk wax posterior marginal mean estimates')

stalk wax posterior marginal mean estimates
level	median	66% QCI	95% QCI
Low	0.077	(0.064, 0.094)	(0.053, 0.113)
Moderate	0.123	(0.106, 0.142)	(0.091, 0.163)
High	0.088	(0.076, 0.102)	(0.064, 0.119)

emm_table(emm_sheathedge, 'leaf sheath edge posterior marginal mean estimates')

leaf sheath edge posterior marginal mean estimates
level	median	66% QCI	95% QCI
Non-necrotic	0.082	(0.07, 0.097)	(0.059, 0.115)
Necrotic with age	0.055	(0.047, 0.064)	(0.039, 0.076)

emm_table(emm_maturity, 'maturity posterior marginal mean estimates')

maturity posterior marginal mean estimates
level	median	66% QCI	95% QCI
Early	0.075	(0.063, 0.088)	(0.053, 0.105)
Early-Mid	0.130	(0.112, 0.151)	(0.094, 0.175)
Middle	0.064	(0.055, 0.074)	(0.046, 0.087)

emm_table(emm_tightness, 'leaf sheath tightness posterior marginal mean estimates')

leaf sheath tightness posterior marginal mean estimates
level	median	66% QCI	95% QCI
Loose	0.097	(0.084, 0.113)	(0.071, 0.131)
Average	0.060	(0.051, 0.072)	(0.042, 0.085)
Tight	0.088	(0.076, 0.102)	(0.063, 0.12)

trend_trs <- emtrends(binom_fit_bestcovs, ~ 1, var = 'Year_1_TRS')

trend_fib <- emtrends(binom_fit_bestcovs, ~ 1, var = 'Year_1_Fiber')



trend_trs_quant <- gather_emmeans_draws(trend_trs) %>%

  median_qi(.width = c(.66, .95)) %>%

  mutate(variable = 'Year 1 TRS')



trend_fib_quant <- gather_emmeans_draws(trend_fib) %>%

  median_qi(.width = c(.66, .95)) %>%

  mutate(variable = 'Year 1 Fiber')



bind_rows(trend_trs_quant, trend_fib_quant) %>%

  mutate(QCI = glue('({round(.lower, 3)}, {round(.upper, 3)})'),

           .value = round(.value, 3)) %>%

  pivot_wider(id_cols = c(variable, .value), names_from = .width, values_from = QCI, names_prefix = 'QCI') %>%

    gt(caption = 'Posterior marginal trend estimates for continuous traits') %>%

    cols_label(.value = 'median', QCI0.66 = '66% QCI', QCI0.95 = '95% QCI') %>%

    tab_options(column_labels.font.weight = 'bold')

Posterior marginal trend estimates for continuous traits
variable	median	66% QCI	95% QCI
Year 1 TRS	0.210	(0.175, 0.248)	(0.135, 0.284)
Year 1 Fiber	-0.272	(-0.316, -0.229)	(-0.363, -0.186)

Tables of contrasts between levels of categorical traits

The contrasts shown here are pairwise comparisons between the different trait levels, all in the form of odds ratios (OR). As above, the results are averaged over the levels of all the other traits. OR = 1 indicates no difference between the two trait levels. The posterior medians are shown as well as the 66% and 95% quantile credible intervals around the medians.

contrast_table <- function(con, table_caption) {

  con_quant <- gather_emmeans_draws(con) %>%

    mutate(.value = exp(.value),

           contrast = gsub('-', '/', contrast, fixed = TRUE)) %>%

    median_qi(.width = c(.66, .95))

  

  con_quant %>%

    mutate(QCI = glue('({round(.lower, 3)}, {round(.upper, 3)})'),

           .value = round(.value, 3)) %>%

    pivot_wider(id_cols = c(contrast, .value), names_from = .width, values_from = QCI, names_prefix = 'QCI') %>%

    gt(caption = table_caption) %>%

    cols_label(.value = 'median', QCI0.66 = '66% QCI', QCI0.95 = '95% QCI') %>%

    tab_options(column_labels.font.weight = 'bold')

    

}



contrast_table(con_wax, 'stalk wax contrasts')

stalk wax contrasts
contrast	median	66% QCI	95% QCI
Low / High	0.866	(0.753, 0.994)	(0.659, 1.157)
Low / Moderate	0.601	(0.518, 0.697)	(0.447, 0.826)
Moderate / High	1.439	(1.329, 1.57)	(1.216, 1.712)

contrast_table(con_sheathedge, 'leaf sheath edge contrasts')

leaf sheath edge contrasts
contrast	median	66% QCI	95% QCI
(Non/necrotic) / Necrotic with age	1.546	(1.422, 1.686)	(1.302, 1.856)

contrast_table(con_maturity, 'maturity contrasts')

maturity contrasts
contrast	median	66% QCI	95% QCI
(Early/Mid) / Middle	2.182	(2.005, 2.382)	(1.813, 2.613)
Early / (Early/Mid)	0.543	(0.495, 0.591)	(0.452, 0.65)
Early / Middle	1.182	(1.097, 1.275)	(1.014, 1.383)

contrast_table(con_tightness, 'leaf sheath tightness contrasts')

leaf sheath tightness contrasts
contrast	median	66% QCI	95% QCI
Average / Tight	0.667	(0.6, 0.739)	(0.532, 0.827)
Loose / Average	1.682	(1.503, 1.88)	(1.338, 2.133)
Loose / Tight	1.120	(1.032, 1.221)	(0.943, 1.334)

Interpretation

We see that varieties with low and high stalk wax tend to have lower proportion damaged internodes than moderate, averaged across all other traits. Varieties with necrotic leaf sheath edges tend to have lower proportion damage. Varieties with lower TRS have lower proportion damage, indicating a possible tradeoff between sugar production and defense against borers (note the TRS trait correlates highly with most of the other sugar yield traits, so you could use any other yield trait as a stand-in for TRS in that statement). The maturity trait shows some differences where early-mid maturing varieties tend to have higher damage than early and late. Varieties with high fiber content in year 1 have lower proportion damage, which seems to match with the TRS result. Finally, varieties with average leaf sheath tightness tend to have a lower proportion damage than those with loose or tight sheaths.

SCB trait analysis

Quentin D. Read

August 15, 2023