Version history

  • 2023-08-15: revised version based on initial peer review
  • 2023-03-15: version for archiving on Ag Data Commons
  • 2023-02-28: edit figures to look like the other figures in the manuscript
  • 2023-02-07: first complete draft

Summary

This notebook contains R code necessary to reproduce the trait covariate analysis in the manuscript:

Penn, H. J. and Q. D. Read. 2023. Stem borer damage dependent on interactions of plant variety, age, and prior conspecific herbivory. Functional Ecology, in review. update this when MS is accepted.

This is an analysis using variety-level traits as covariates in our statistical model to predict proportion of damaged internodes. In this revised version, instead of doing a formal variable selection process, we examine the pairwise correlation matrix of the variety traits and select six traits that are relatively uncorrelated with one another and that encompass both quantitative traits that breeders select for, and qualitative traits that are hypothesized to be related to defense against SCB.

We fit a model similar to our original model with previous damage, stalk position, year, and crop cycle as fixed effects, but the difference being instead of variety as a fixed effect, the six traits are all included as fixed effects. All continuous traits were z-transformed before inclusion in the model so that their effects are comparable. The parameter estimates for each of the traits are presented here.

This analysis follows the primary analysis in the notebook 01_boring_analysis.Rmd.

Setup

library(tidyverse)

library(brms)

library(emmeans)

library(readxl)

library(gt)

library(tidybayes)

library(polycor)

library(reshape2)

library(brmsmargins)

library(glue)



options(mc.cores = 4, brms.backend = 'cmdstanr', brms.file_refit = 'never')

Read the variety lookup table and trait dataset. Convert categorical traits to factors with the appropriate level ordering. “Unknown” will come last in the ordering in all cases.

lookup <- read_csv('project/data/variety_lookup.csv') %>%

  mutate(check = grepl('\\*', `SCB resistance`),

         resistance = gsub('\\*', '', `SCB resistance`))



lookup <- mutate(lookup, Variety_nospace = gsub(' ', '', gsub('-', '_', Variety)))



traits <- read_csv('project/data/Select_variety_traits_12Dec2022.csv') %>%

  rename_with(\(x) gsub(' ', '_', x))



traits <- mutate(traits, Variety_nospace = gsub(' ', '', gsub('-', '_', Variety)),

                 SCB_ranking = factor(SCB_ranking),

                 SCB_resistance_rating = factor(SCB_resistance_rating, levels = c('Susceptible', 'Moderately Susceptible', 'Moderately Resistant', 'Resistant')),

                 Maturity = factor(Maturity, levels = c('Early', 'Early-Mid', 'Middle')),

                 Stalk_Wax = factor(Stalk_Wax, levels = c('Low', 'Moderate', 'High', 'Unknown')),

                 Leaf_Sheath_Wax = factor(Leaf_Sheath_Wax, levels = c('Low', 'Moderate', 'High', 'Unknown')),

                 Leaf_Sheath_Hair = factor(Leaf_Sheath_Hair, levels = c('None', 'Low', 'High', 'Unknown')),

                 Collar_Hair = factor(Collar_Hair, levels = c('Low', 'Moderate', 'High', 'Unknown')),

                 Leaf_Sheath_Tightness = factor(Leaf_Sheath_Tightness, levels = c('Loose', 'Average', 'Tight', 'Unknown')),

                 Leaf_Sheath_Edge = factor(Leaf_Sheath_Edge, levels = c('Non-necrotic', 'Necrotic with age', 'Unknown'))

                 )



traits <- rename(traits, Variety_formatted = Variety, Variety = Variety_nospace)

Read the main dataset. Do the same cleaning steps as were done in the primary analysis document.

dat <- read_csv('project/data/BoredInternodes_26April2022_no format.csv', col_types = paste0('icccccffi', strrep('i', 30), 'cc'))

lookup <- read_csv('project/data/variety_lookup.csv') %>%

  mutate(check = grepl('\\*', `SCB resistance`),

         resistance = gsub('\\*', '', `SCB resistance`))



dat_long <- dat %>%

  select(-c(`# Internodes`, Notes)) %>%

  pivot_longer(cols = `1`:`30`, names_to = 'internode', values_to = 'damage') %>%

  filter(!is.na(damage)) %>%

  mutate(internode = as.integer(internode))



dat_long <- dat_long %>% mutate(

  damage = if_else(damage == 22, 2L, damage),

  Stubble = if_else(Stubble == 'Plant cane', 'Plant Cane', Stubble),

  Block = factor(substr(as.character(Plot), 1, 1)),

  Variety = gsub(' ', '', gsub('-', '_', Variety))

)



dat_long_withcov <- dat_long %>%

  arrange(Year, Date, Location, Experiment, Stubble, Variety, Block, Plot, Stalk, Treatment, internode) %>%

  group_by(Year, Date, Location, Experiment, Stubble, Variety, Block, Plot, Stalk, Treatment) %>%

  mutate(previous_damage = lag(cumsum(damage > 0), default = 0),

         previous_damage_proportion = c(0, (previous_damage / 0:(length(damage)-1))[-1])) %>%

  ungroup



damage_over_time_by_stalk <- dat_long %>%

  filter(grepl('YRS', Experiment)) %>%

  group_by(Year, Variety, Plot, Stalk) %>%

  summarize(prop_damaged = sum(damage > 0)/length(damage))



damage_over_time_by_plot <- damage_over_time_by_stalk %>%

  group_by(Year, Variety, Plot) %>%

  summarize(prop_damaged = mean(prop_damaged))



varieties_5yrs <- damage_over_time_by_plot %>%

  group_by(Variety) %>%

  filter(length(unique(Year)) >= 5) %>%

  pull(Variety) %>%

  unique



dat_fit <- dat_long_withcov %>% 

  filter(grepl('YRS', Experiment), Variety %in% varieties_5yrs) %>% 

  dplyr::select(-c(Location, Experiment, Treatment)) %>%

  mutate(damage_binary = as.integer(damage > 0),

         Year = Year - min(Year))

Join the damage data with the trait data.

damage_traits <- left_join(dat_fit, traits)

Create vector of trait names.

trait_names <- c(

"Year_1_Sugar", "Year_1_Biomass", "Year_1_TRS", "Year_1_Stalk_Weight", 

"Year_1_Stalk_Population", "Year_1_Fiber", "Year_2_Sugar", "Year_2_Biomass", 

"Year_2_TRS", "Year_2_Stalk_Weight", "Year_2_Stalk_Population", 

"Year_2_Fiber", "Maturity", "Stalk_Wax", "Leaf_Sheath_Wax", "Leaf_Sheath_Hair", 

"Leaf_Sheath_Tightness", "Leaf_Sheath_Edge", "Collar_Hair")

Feature selection

Examine correlation matrix of the different traits. It is a heterogeneous correlation matrix because we have different types of variables, categorical and continuous. If a pair of variables are both numeric, we can use Pearson correlation. If one is ordinal, we use polyserial correlation. If two are ordinal, we use polychoric correlation.

trait_mat <- traits %>% select(all_of(trait_names)) %>% as.data.frame



cormat <- hetcor(trait_mat, use = 'pairwise.complete.obs')

Find the highest pairwise correlations in the correlation matrix so that we can remove some of the trait variables from analysis that are not adding any additional information.

cormat_tri <- cormat$correlations

cormat_tri[lower.tri(cormat_tri, diag = TRUE)] <- NA

corr_long <- melt(cormat_tri) %>% filter(!is.na(value)) %>% arrange(-abs(value))

Here are the ten highest pairwise correlations among the traits. Some of the continuous traits are highly correlated (r > 0.8) with each other. We can remove year 2 stalk weight, year 2 TRS, year 2 fiber, and year 2 sugar from the model because they are very correlated with the year 1 versions of those values. We can also remove year 1 biomass and year 2 biomass from the model because they are very correlated with year 1 sugar and year 2 sugar, respectively. In addition, it looks like stalk wax and leaf sheath wax are highly correlated because varieties waxy in one part of the plant are going to be waxy overall. The same applies to leaf sheath hair and collar hair. Out of those traits, stalk wax and collar hair seem to be more evenly distributed than leaf sheath wax and leaf sheath hair. Thus, we remove leaf sheath wax and hair.

Going further, collar hair is also reasonably highly correlated with both stalk wax and maturity so we can remove it as well. Year 1 biomass and year 1 sugar are both highly correlated with year 1 TRS so we remove them. The remaining stalk weight and stalk population traits are correlated with fiber and TRS so we remove them as well as they are expressing similar information.

gt(head(corr_long, 10)) %>% fmt_number(value, decimals = 3)
Var1 Var2 value
Year_1_TRS Year_2_TRS 0.971
Year_1_Stalk_Weight Year_2_Stalk_Weight 0.969
Year_1_Fiber Year_2_Fiber 0.928
Year_1_Sugar Year_1_Biomass 0.893
Year_2_Sugar Year_2_Biomass 0.851
Year_1_Biomass Year_2_Sugar 0.814
Stalk_Wax Leaf_Sheath_Wax 0.768
Leaf_Sheath_Hair Collar_Hair 0.762
Year_1_Sugar Year_2_Sugar 0.753
Stalk_Wax Leaf_Sheath_Hair 0.715
trait_names_final <- c('Year_1_TRS', 'Year_1_Fiber', 'Maturity', 'Stalk_Wax', 'Leaf_Sheath_Edge', 'Leaf_Sheath_Tightness')

The result is six traits: year 1 TRS, year 1 fiber, maturity, stalk wax, leaf sheath edge, and leaf sheath tightness.

Scaling trait values

The continuous trait values are all in different units. It is better for model fitting to scale (z-transform) them.

continuous_traits <- c('Year_1_TRS', 'Year_1_Fiber')



damage_traits_scaled <- damage_traits %>%

  mutate(across(all_of(continuous_traits), ~ as.vector(scale(.))))

Model fitting

Fit a model with the six selected traits, previous damage, stalk position, year, and crop cycle as fixed effects, without variety identity as fixed effect. The random effects are the same as in the original variety model. This chunk of code was run on SciNet.

cov_model_formula <- formula(paste('damage_binary ~ internode + previous_damage + Year + Stubble', paste(trait_names_final, collapse = ' + '), '(1|Year) + (1|Plot:Year) + (1|Stalk:Plot:Year)', sep = ' + '))



binom_fit_bestcovs <- brm(

  cov_model_formula,

  data = damage_traits_scaled, family = bernoulli(link = 'logit'),

  prior = c(

    prior(normal(0, 1), class = 'b')

  ),

  chains = 4, iter = 4000, warmup = 3000, seed = 815,

  file = 'project/binom_fit_reduced'

)

Diagnostics

Look at the convergence diagnostics and the posterior predictive check plots. Everything looks good.

max(rhat(binom_fit_bestcovs))
## [1] 1.012727
pp_check(binom_fit_bestcovs, type = 'bars')

Posterior predictions by trait

Contrasts between levels of categorical traits

con_wax <- contrast(emm_wax, 'pairwise')

con_sheathedge <- contrast(emm_sheathedge, 'pairwise')

con_maturity <- contrast(emm_maturity, 'pairwise')

con_tightness <- contrast(emm_tightness, 'pairwise')

Tables of contrasts between levels of categorical traits

The contrasts shown here are pairwise comparisons between the different trait levels, all in the form of odds ratios (OR). As above, the results are averaged over the levels of all the other traits. OR = 1 indicates no difference between the two trait levels. The posterior medians are shown as well as the 66% and 95% quantile credible intervals around the medians.

contrast_table <- function(con, table_caption) {

  con_quant <- gather_emmeans_draws(con) %>%

    mutate(.value = exp(.value),

           contrast = gsub('-', '/', contrast, fixed = TRUE)) %>%

    median_qi(.width = c(.66, .95))

  

  con_quant %>%

    mutate(QCI = glue('({round(.lower, 3)}, {round(.upper, 3)})'),

           .value = round(.value, 3)) %>%

    pivot_wider(id_cols = c(contrast, .value), names_from = .width, values_from = QCI, names_prefix = 'QCI') %>%

    gt(caption = table_caption) %>%

    cols_label(.value = 'median', QCI0.66 = '66% QCI', QCI0.95 = '95% QCI') %>%

    tab_options(column_labels.font.weight = 'bold')

    

}



contrast_table(con_wax, 'stalk wax contrasts')
stalk wax contrasts
contrast median 66% QCI 95% QCI
Low / High 0.866 (0.753, 0.994) (0.659, 1.157)
Low / Moderate 0.601 (0.518, 0.697) (0.447, 0.826)
Moderate / High 1.439 (1.329, 1.57) (1.216, 1.712)
contrast_table(con_sheathedge, 'leaf sheath edge contrasts')
leaf sheath edge contrasts
contrast median 66% QCI 95% QCI
(Non/necrotic) / Necrotic with age 1.546 (1.422, 1.686) (1.302, 1.856)
contrast_table(con_maturity, 'maturity contrasts')
maturity contrasts
contrast median 66% QCI 95% QCI
(Early/Mid) / Middle 2.182 (2.005, 2.382) (1.813, 2.613)
Early / (Early/Mid) 0.543 (0.495, 0.591) (0.452, 0.65)
Early / Middle 1.182 (1.097, 1.275) (1.014, 1.383)
contrast_table(con_tightness, 'leaf sheath tightness contrasts')
leaf sheath tightness contrasts
contrast median 66% QCI 95% QCI
Average / Tight 0.667 (0.6, 0.739) (0.532, 0.827)
Loose / Average 1.682 (1.503, 1.88) (1.338, 2.133)
Loose / Tight 1.120 (1.032, 1.221) (0.943, 1.334)

Interpretation

We see that varieties with low and high stalk wax tend to have lower proportion damaged internodes than moderate, averaged across all other traits. Varieties with necrotic leaf sheath edges tend to have lower proportion damage. Varieties with lower TRS have lower proportion damage, indicating a possible tradeoff between sugar production and defense against borers (note the TRS trait correlates highly with most of the other sugar yield traits, so you could use any other yield trait as a stand-in for TRS in that statement). The maturity trait shows some differences where early-mid maturing varieties tend to have higher damage than early and late. Varieties with high fiber content in year 1 have lower proportion damage, which seems to match with the TRS result. Finally, varieties with average leaf sheath tightness tend to have a lower proportion damage than those with loose or tight sheaths.