This document is supplementary information for the manuscript:
Taliercio, E., D. Eickholt, Q. D. Read, T. Carter, N. Waldeck, and B. Fallen. 2023. Parental Choice and Intrapopulation Selection for Seed Size Impacts Uprightness of Progeny Derived from Interspecific Hybridization between Glycine max and Glycine soja.
This includes all code required to reproduce the analysis described as “Experiment 3” in the manuscript.
Load the necessary packages.
library(data.table)
library(ggplot2)
library(tidyr)
library(brms)
library(emmeans)
library(tidybayes)
library(ggtext)
library(bayestestR)
library(effectsize)
library(Rutilitybelt) # This package may be installed by calling remotes::install_github('qdread/Rutilitybelt')
options(mc.cores = 4, brms.backend = 'cmdstanr', brms.file_refit = 'never')
Load data and set factor levels.
dat <- fread('data/counts_seedwt.csv')
dat[, size_ordered := ordered(size, levels = c('< 8 3/4', '> 8 3/4', '> 10', '> 10 3/4', '> 11 1/2', '> 12'))]
dat[, size_combined := factor(size_combined, levels = c('< 8 3/4', '> 8 3/4', '> 10 3/4', '> 11 1/2', '> 12'))]
dat[, soja := factor(soja)]
Make plots showing the values of number selected per number planted for the different combinations of predictors. Logit transform on y axis.
pct_scale <- scale_y_continuous(trans = 'logit', name = 'percent selected', breaks = c(0.001, 0.003, 0.01, 0.03, 0.10), labels = scales::percent_format(accuracy = 0.1))
soja_facet <- facet_wrap(~ soja, labeller = as_labeller(function(x) paste0('*G. soja* parent: PI', x)))
ggplot(dat |> transform(F3_location = ifelse(dat$F3_location == 'CLA', 'CCRS', 'UCPRS'),
F2_location = ifelse(dat$F2_location == 'Caswell', 'CRS', 'LCPRS'),
max = gsub('NC ', 'NC-', max)),
aes(x = max, y = n_selected/n_planted, fill = interaction(F2_location, F3_location, sep = ' × '))) +
geom_boxplot(position = 'dodge') +
soja_facet +
labs(x = '*G. max* parent', fill = 'F<sub>2</sub> location × F<sub>3</sub> location') +
pct_scale +
theme(legend.position = 'bottom',
axis.title.x = element_markdown(),
legend.title = element_markdown(),
legend.text = element_markdown(),
strip.text = element_markdown())
#ggsave('C:/Users/qdread/onedrive_usda/ars_projects/taliercio/figs/FIG09_selectabilitybynursery.png', dpi = 400, height = 4, width = 7.5)
Also look at seed size.
ggplot(dat |> transform(max = gsub('NC ', 'NC-', max)), aes(x = max, y = n_selected/n_planted, fill = size_ordered)) +
geom_boxplot(position = 'dodge') +
soja_facet + pct_scale +
scale_fill_viridis_d(name = 'seed size class',
labels = c('< 8 ¾', '> 8 ¾', '> 10', '> 10 ¾', '> 11 ½', '> 12')) +
labs(x = '*G. max* parent') +
theme(legend.position = 'bottom',
axis.title.x = element_markdown(),
strip.text = element_markdown(),
legend.text = element_markdown())
#ggsave('C:/Users/qdread/onedrive_usda/ars_projects/taliercio/figs/FIG10_selectabilitybysizeclass.png', dpi = 400, height = 4, width = 7.5)
The statstical model should answer the following research questions:
Based on this we will have a 3-way interaction between G. max parent, G. soja parent, and seed size as the fixed effects, and F2 nursery location and F3 nursery location as crossed random effects (random intercepts).
The other consideration is that the response variable is number selected per number planted. This is a binomial trials situation, where we are modeling how the probability of being selected varies with our different fixed and random effects. Each seed planted is a binary trial that has a certain probability of being selected (outcome = 1) or not selected (outcome = 0).
All of this can be combined into a single Bayesian binomial mixed-effects regression model that will enable us to answer all the questions that the design is set up to answer, while properly controlling for any random effects that might arise from the different nurseries.
In the following model, I am using the continuous variable for 100-seed weight to represent seed size. That makes the model a lot easier to deal with than if we used the categories. The formula includes the full three-way interaction and the random intercept terms. It is a binomial model with a logit link function, meaning we are comparing how the log odds changes among the different groups. Some priors are added to limit the coefficients to a reasonable range centered around zero which makes the model converge better.
fit_binom_seedwt_full <- brm(
n_selected | trials(n_planted_trials) ~ max * soja * ave_100sw + (1|F2_location) + (1|F3_location),
data = dat, family = binomial,
prior = c(
prior('student_t(3, 0, 3)', class = 'b')
),
iter = 6000, warmup = 4000, chains = 4, seed = 910,
file = 'data/fit_binomial_seedwt_full'
)
First, let’s look at the model diagnostics to make sure it converged.
All Rhat
statistics (potential scale reduction factor which
should be less than 1.05 for all parameters) are close to 1 which means
it converged.
summary(fit_binom_seedwt_full)
## Family: binomial
## Links: mu = logit
## Formula: n_selected | trials(n_planted_trials) ~ max * soja * ave_100sw + (1 | F2_location) + (1 | F3_location)
## Data: dat (Number of observations: 160)
## Draws: 4 chains, each with iter = 6000; warmup = 4000; thin = 1;
## total post-warmup draws = 8000
##
## Group-Level Effects:
## ~F2_location (Number of levels: 2)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.49 0.63 0.02 2.25 1.01 1205 1351
##
## ~F3_location (Number of levels: 2)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.45 0.55 0.02 2.03 1.01 1152 2002
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat
## Intercept -3.54 0.51 -4.61 -2.40 1.01
## maxNCDunphy 0.81 0.19 0.42 1.19 1.00
## maxNCRaleigh -0.30 0.22 -0.73 0.12 1.00
## maxTCHM06M240 -1.11 0.22 -1.55 -0.66 1.00
## soja425045 -2.98 0.25 -3.46 -2.50 1.00
## ave_100sw -0.04 0.03 -0.10 0.02 1.00
## maxNCDunphy:soja425045 -1.72 0.30 -2.31 -1.13 1.00
## maxNCRaleigh:soja425045 0.99 0.34 0.32 1.65 1.00
## maxTCHM06M240:soja425045 -0.47 0.33 -1.11 0.17 1.00
## maxNCDunphy:ave_100sw -0.08 0.03 -0.15 -0.01 1.00
## maxNCRaleigh:ave_100sw 0.03 0.04 -0.04 0.11 1.00
## maxTCHM06M240:ave_100sw 0.18 0.04 0.11 0.26 1.00
## soja425045:ave_100sw 0.19 0.05 0.10 0.28 1.00
## maxNCDunphy:soja425045:ave_100sw 0.25 0.05 0.14 0.35 1.00
## maxNCRaleigh:soja425045:ave_100sw -0.28 0.06 -0.39 -0.16 1.00
## maxTCHM06M240:soja425045:ave_100sw 0.07 0.06 -0.04 0.18 1.00
## Bulk_ESS Tail_ESS
## Intercept 1759 1565
## maxNCDunphy 1764 2740
## maxNCRaleigh 1922 3004
## maxTCHM06M240 1841 2869
## soja425045 1403 2591
## ave_100sw 1525 2716
## maxNCDunphy:soja425045 1456 2671
## maxNCRaleigh:soja425045 1718 3017
## maxTCHM06M240:soja425045 1741 3114
## maxNCDunphy:ave_100sw 1629 2668
## maxNCRaleigh:ave_100sw 1769 2819
## maxTCHM06M240:ave_100sw 1693 2798
## soja425045:ave_100sw 1338 2638
## maxNCDunphy:soja425045:ave_100sw 1296 2477
## maxNCRaleigh:soja425045:ave_100sw 1616 2847
## maxTCHM06M240:soja425045:ave_100sw 1586 2706
##
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Now let’s look at the random effect estimates. On this plot and the following, the point is the median, the thick bar is the 66% quantile credible interval (about 1 SD), and the thin bar is the 95% quantile credible interval (about 2 SD).
f2_effect <- fit_binom_seedwt_full %>%
spread_draws(r_F2_location[F2_location,term])
f3_effect <- fit_binom_seedwt_full %>%
spread_draws(r_F3_location[F3_location,term])
ranef_both <- rbindlist(list(f2_effect,f3_effect), use.names = FALSE)
ranef_both[, location := factor(F2_location, levels = rev(c('Caswell','Hugo','FF','CLA')), labels = rev(c('CRS (F<sub>2</sub>)','LCPRS (F<sub>2</sub>)','UCPRS (F<sub>3</sub>)','CCRS (F<sub>3</sub>)')))]
ggplot(ranef_both,
aes(y = location, x = r_F2_location)) +
stat_sample_slabinterval(geom = 'pointinterval', point_interval = median_qi) +
geom_vline(xintercept = 0, color = "red", lty = 'dashed') +
scale_x_continuous(limits = c(-0.5, 0.5)) +
labs(y = 'nursery', x = 'effect') +
theme(axis.text.y = element_markdown())
#ggsave('C:/Users/qdread/onedrive_usda/ars_projects/taliercio/figs/randomeffects.png', dpi = 400, height = 4, width = 5)
Next, plot the estimated marginal means for the different crosses. Here we see that the probability of selecting the progeny is uniformly higher for soja parent PI407191. The NC-Raleigh max parent may be a little worse and the NC-Dunphy max parent a little better, but this is a relatively weak effect and does not depend on the soja parent.
emm_max_soja <- emmeans(fit_binom_seedwt_full, ~ max + soja, type = "response")
max_soja_toplot <- gather_emmeans_draws(emm_max_soja) |> setDT()
max_soja_toplot[, p := plogis(.value)]
ggplot(max_soja_toplot, aes(y = max, x = p)) +
stat_sample_slabinterval(geom = 'pointinterval', point_interval = median_qi) +
facet_wrap(~ soja, nrow = 2, labeller = labeller(soja = function(x) paste('*G. soja* parent:', x))) +
scale_x_continuous(name = 'probability of selection', trans = 'logit', breaks = c(.001, .002, .005, .01, .02, .05, .10), expand = expansion(mult = c(0, 0)), limits = c(0.0005, 0.11), labels = scales::percent_format(accuracy = 0.1)) +
labs(y = '*G. max* parent') +
theme(axis.title.y = element_markdown(), strip.text = element_markdown())
#ggsave('marginalmeans_crosses.png', dpi = 400, height = 7, width = 5)
These are the estimated marginal trends (slopes of the selectability vs. seed weight line) for the different crosses, as well as for the individual max and soja parents averaged across the other parent. Most of them are positive as expected (increasing seed weight leads to higher percent progeny selected) but not universally. Some are neutral or even negative. The trend noticed in the exploratory plots, that there is a positive trend for selectability with seed weight in PI425045 but not PI407191, is supported by the model.
swt_trends_max_soja <- emtrends(fit_binom_seedwt_full, ~ max + soja, var = 'ave_100sw', type = 'response')
swt_trends_toplot <- gather_emmeans_draws(swt_trends_max_soja) |> setDT()
ggplot(swt_trends_toplot, aes(y = max, x = .value)) +
stat_sample_slabinterval(geom = 'pointinterval', point_interval = median_qi) +
geom_vline(xintercept = 0, color = 'red', linetype = 'dashed') +
facet_wrap(~ soja, nrow = 2, labeller = labeller(soja = function(x) paste('*G. soja* parent:', x))) +
scale_x_continuous(name = 'slope of selectability vs. seed weight trend') +
labs(y = '*G. max* parent') +
theme(axis.title.y = element_markdown(), strip.text = element_markdown())
#ggsave('seedweight_slopes_crosses.png', dpi = 400, height = 7, width = 5)
swt_trends_max <- emtrends(fit_binom_seedwt_full, ~ max, var = 'ave_100sw', type = 'response')
swt_trends_max_toplot <- gather_emmeans_draws(swt_trends_max) |> setDT()
ggplot(swt_trends_max_toplot, aes(y = max, x = .value)) +
stat_sample_slabinterval(geom = 'pointinterval', point_interval = median_qi) +
geom_vline(xintercept = 0, color = 'red', linetype = 'dashed') +
scale_x_continuous(name = 'slope of selectability vs. seed weight trend') +
labs(y = '*G. max* parent') +
theme(axis.title.y = element_markdown(), strip.text = element_markdown())
#ggsave('seedweight_slopes_max.png', dpi = 400, height = 4, width = 5)
swt_trends_soja <- emtrends(fit_binom_seedwt_full, ~ soja, var = 'ave_100sw', type = 'response')
swt_trends_soja_toplot <- gather_emmeans_draws(swt_trends_soja) |> setDT()
ggplot(swt_trends_soja_toplot, aes(y = soja, x = .value)) +
stat_sample_slabinterval(geom = 'pointinterval', point_interval = median_qi) +
geom_vline(xintercept = 0, color = 'red', linetype = 'dashed') +
scale_x_continuous(name = 'slope of selectability vs. seed weight trend') +
labs(y = '*G. soja* parent') +
theme(axis.title.y = element_markdown(), strip.text = element_markdown())
#ggsave('seedweight_slopes_soja.png', dpi = 400, height = 4, width = 5)
Contrasts between pairs of marginal means allow us to make quantifiable statements about differences between the means in the manuscript. These contrasts are odds ratios (ratio of the selection odds between the two groups being compared). The Bayesian model does not require any correction for multiple comparisons. An odds ratio of 1, or log odds ratio of 0, means no difference between the two groups, i.e. an identical selection probability in both.
emm_max_soja_raw <- emmeans(fit_binom_seedwt_full, ~ max | soja)
contr_max_soja <- contrast(emm_max_soja_raw, 'pairwise')
emm_max_raw <- emmeans(fit_binom_seedwt_full, ~ max)
contr_max <- contrast(emm_max_raw, 'pairwise')
emm_soja_raw <- emmeans(fit_binom_seedwt_full, ~ soja)
contr_soja <- contrast(emm_soja_raw, 'pairwise')
Post hoc contrasts for the slope trends.
swt_trends_max_bysoja <- emtrends(fit_binom_seedwt_full, ~ max | soja, var = 'ave_100sw', type = 'response')
swt_contr_max_soja <- contrast(swt_trends_max_bysoja, 'pairwise')
swt_contr_max <- contrast(swt_trends_max, 'pairwise')
swt_contr_soja <- contrast(swt_trends_soja, 'pairwise')
To assessing the strength of evidence for the different effects and comparisons in our model we calculate Bayes factors (BF). Our prior estimate for the difference between the treatments was centered around zero, so the prior essentially represents a null hypothesis. If BF = 1, we have not changed our belief about the parameter at all. The prior estimate (of zero) is just as likely to be true as before. If BF > 1, we have evidence for our new estimate being true. The higher the BF, the more evidence. But if BF < 1, we have more evidence of the prior (zero) estimate being true than we did before we fit the model.
Bayes factors for the pairwise contrasts between the selectability estimates for the different crosses. This requires finding versions of the estimated marginal means and contrasts using the prior model.
fit_prior <- unupdate(fit_binom_seedwt_full)
contr_max_prior <- emmeans(fit_prior, ~ max) |> contrast('pairwise')
contr_soja_prior <- emmeans(fit_prior, ~ soja) |> contrast('pairwise')
contr_max_soja_prior <- emmeans(fit_prior, ~ max | soja) |> contrast('pairwise')
bf_contr_max <- bayesfactor(contr_max, prior = contr_max_prior)
bf_contr_soja <- bayesfactor(contr_soja, prior = contr_soja_prior)
bf_contr_max_soja <- bayesfactor(contr_max_soja, prior = contr_max_soja_prior)
interpbf_contr_max <- interpret_bf(bf_contr_max$log_BF, log = TRUE)
interpbf_contr_soja <- interpret_bf(bf_contr_soja$log_BF, log = TRUE)
interpbf_contr_max_soja <- interpret_bf(bf_contr_max_soja$log_BF, log = TRUE)
Bayes factors for the seed weight slope trends.
swt_trends_max_soja_prior <- emtrends(fit_prior, ~ max | soja, var = 'ave_100sw', type = 'response')
swt_trends_max_prior <- emtrends(fit_prior, ~ max, var = 'ave_100sw', type = 'response')
swt_trends_soja_prior <- emtrends(fit_prior, ~ soja, var = 'ave_100sw', type = 'response')
bf_swt_max_soja <- bayesfactor(swt_trends_max_soja, prior = swt_trends_max_soja_prior)
bf_swt_max <- bayesfactor(swt_trends_max, prior = swt_trends_max_prior)
bf_swt_soja <- bayesfactor(swt_trends_soja, prior = swt_trends_soja_prior)
interpbf_swt_max_soja <- interpret_bf(bf_swt_max_soja$log_BF, log = TRUE)
interpbf_swt_max <- interpret_bf(bf_swt_max$log_BF, log = TRUE)
interpbf_swt_soja <- interpret_bf(bf_swt_soja$log_BF, log = TRUE)
Bayes factors for the pairwise contrasts between the seed weight slopes.
swt_contr_max_soja_prior <- contrast(swt_trends_max_soja_prior, 'pairwise')
swt_contr_max_prior <- contrast(swt_trends_max_prior, 'pairwise')
swt_contr_soja_prior <- contrast(swt_trends_soja_prior, 'pairwise')
bf_swt_contr_max_soja <- bayesfactor(swt_contr_max_soja, prior = swt_contr_max_soja_prior)
bf_swt_contr_max <- bayesfactor(swt_contr_max, prior = swt_contr_max_prior)
bf_swt_contr_soja <- bayesfactor(swt_contr_soja, prior = swt_contr_soja_prior)
interpbf_swt_contr_max_soja <- interpret_bf(bf_swt_contr_max_soja$log_BF, log = TRUE)
interpbf_swt_contr_max <- interpret_bf(bf_swt_contr_max$log_BF, log = TRUE)
interpbf_swt_contr_soja <- interpret_bf(bf_swt_contr_soja$log_BF, log = TRUE)
The BFs are displayed along with the contrasts and their quantiles in tables below.
Here are some tables that reproduce some of the information from the graphs above. In all cases, I give parameter estimate (median) and lower and upper bounds of the 66% and 95% credible intervals. Everything is given to 3 significant figures.
location | estimate | 95% CI lower | 95% CI upper | 66% CI lower | 66% CI upper |
---|---|---|---|---|---|
CRS (F2) | 0.01860 | -0.837 | 0.712 | -0.224 | 0.169 |
LCPRS (F2) | -0.03850 | -0.909 | 0.636 | -0.290 | 0.103 |
CCRS (F3) | 0.00943 | -0.855 | 0.676 | -0.210 | 0.155 |
UCPRS (F3) | -0.03290 | -0.909 | 0.624 | -0.265 | 0.100 |
max parent | soja parent | estimate | 95% CI lower | 95% CI upper | 66% CI lower | 66% CI upper |
---|---|---|---|---|---|---|
N7103 | 407191 | 0.02140 | 0.008230 | 0.06780 | 0.01600 | 0.03330 |
NC Dunphy | 407191 | 0.02980 | 0.011500 | 0.09550 | 0.02230 | 0.04660 |
NC Raleigh | 407191 | 0.01930 | 0.007480 | 0.06200 | 0.01440 | 0.03020 |
TCHM06-240 | 407191 | 0.02140 | 0.008290 | 0.06830 | 0.01600 | 0.03350 |
N7103 | 425045 | 0.00351 | 0.001350 | 0.01130 | 0.00262 | 0.00550 |
NC Dunphy | 425045 | 0.00390 | 0.001510 | 0.01240 | 0.00292 | 0.00610 |
NC Raleigh | 425045 | 0.00160 | 0.000619 | 0.00506 | 0.00119 | 0.00249 |
TCHM06-240 | 425045 | 0.00328 | 0.001270 | 0.01040 | 0.00245 | 0.00514 |
max parent | estimate | 95% CI lower | 95% CI upper | 66% CI lower | 66% CI upper |
---|---|---|---|---|---|
N7103 | 0.00865 | 0.00335 | 0.0279 | 0.00648 | 0.01350 |
NC Dunphy | 0.01080 | 0.00417 | 0.0344 | 0.00808 | 0.01690 |
NC Raleigh | 0.00555 | 0.00216 | 0.0177 | 0.00415 | 0.00869 |
TCHM06-240 | 0.00837 | 0.00325 | 0.0267 | 0.00628 | 0.01310 |
soja parent | estimate | 95% CI lower | 95% CI upper | 66% CI lower | 66% CI upper |
---|---|---|---|---|---|
407191 | 0.0226 | 0.00878 | 0.07250 | 0.01700 | 0.03540 |
425045 | 0.0029 | 0.00112 | 0.00925 | 0.00218 | 0.00455 |
Quantiles and Bayes factors are provided. Bayes factors provide a measure of the strength of evidence that the slope is different than our prior estimate centered around zero. Evidence “in favor of” means we think it is not zero, and evidence “against” means it is consistent with an estimate centered around zero.
max parent | soja parent | estimate | 95% CI lower | 95% CI upper | 66% CI lower | 66% CI upper | Bayes factor | interpretation |
---|---|---|---|---|---|---|---|---|
N7103 | 407191 | -0.043 | -0.098 | 0.015 | -0.069 | -0.015 | 0.0273 | very strong evidence against |
NC Dunphy | 407191 | -0.121 | -0.156 | -0.086 | -0.138 | -0.104 | 5760 | extreme evidence in favour of |
NC Raleigh | 407191 | -0.010 | -0.054 | 0.037 | -0.033 | 0.013 | 0.00502 | extreme evidence against |
TCHM06-240 | 407191 | 0.140 | 0.093 | 0.188 | 0.117 | 0.164 | 1220 | extreme evidence in favour of |
N7103 | 425045 | 0.150 | 0.076 | 0.223 | 0.114 | 0.187 | 24.5 | strong evidence in favour of |
NC Dunphy | 425045 | 0.317 | 0.285 | 0.348 | 0.302 | 0.332 | 9.84 × 1018 | extreme evidence in favour of |
NC Raleigh | 425045 | -0.093 | -0.155 | -0.031 | -0.122 | -0.062 | 0.394 | anecdotal evidence against |
TCHM06-240 | 425045 | 0.399 | 0.358 | 0.441 | 0.379 | 0.420 | 1.61 × 1019 | extreme evidence in favour of |
max parent | estimate | 95% CI lower | 95% CI upper | 66% CI lower | 66% CI upper | Bayes factor | interpretation |
---|---|---|---|---|---|---|---|
N7103 | 0.054 | 0.007 | 0.101 | 0.031 | 0.076 | 0.0794 | strong evidence against |
NC Dunphy | 0.098 | 0.075 | 0.122 | 0.087 | 0.110 | 7.29 × 106 | extreme evidence in favour of |
NC Raleigh | -0.051 | -0.091 | -0.012 | -0.070 | -0.032 | 0.0897 | strong evidence against |
TCHM06-240 | 0.270 | 0.238 | 0.302 | 0.254 | 0.286 | 4.12 × 1028 | extreme evidence in favour of |
soja parent | estimate | 95% CI lower | 95% CI upper | 66% CI lower | 66% CI upper | Bayes factor | interpretation |
---|---|---|---|---|---|---|---|
407191 | -0.008 | -0.032 | 0.015 | -0.02 | 0.003 | 0.00377 | extreme evidence against |
425045 | 0.194 | 0.166 | 0.220 | 0.18 | 0.207 | 8.2 × 1011 | extreme evidence in favour of |
First the contrasts of pairs of max parents within soja parent are given. Then the contrasts of the averaged max and soja parents are given.
Quantiles and Bayes factors are included. Verbal interpretation of Bayes factor is provided according to the cutoffs in Jeffreys 1961. Evidence “in favor of” means we think the probability of selection is different for the pair being compared, and evidence “against” means we think it’s probably the same.
max parents contrasted | soja parent | estimate | 95% CI lower | 95% CI upper | 66% CI lower | 66% CI upper | Bayes factor | interpretation |
---|---|---|---|---|---|---|---|---|
N7103 - NC Dunphy | 407191 | 0.716 | 0.647 | 0.790 | 0.683 | 0.751 | 3080 | extreme evidence in favour of |
N7103 - NC Raleigh | 407191 | 1.109 | 0.998 | 1.235 | 1.053 | 1.167 | 0.0154 | very strong evidence against |
N7103 - (TCHM06-240) | 407191 | 0.995 | 0.894 | 1.104 | 0.945 | 1.046 | 0.00288 | extreme evidence against |
NC Dunphy - NC Raleigh | 407191 | 1.549 | 1.410 | 1.697 | 1.480 | 1.620 | 1.68 × 106 | extreme evidence in favour of |
NC Dunphy - (TCHM06-240) | 407191 | 1.388 | 1.271 | 1.517 | 1.328 | 1.451 | 6.26 × 104 | extreme evidence in favour of |
NC Raleigh - (TCHM06-240) | 407191 | 0.896 | 0.814 | 0.989 | 0.854 | 0.941 | 0.0171 | very strong evidence against |
N7103 - NC Dunphy | 425045 | 0.899 | 0.796 | 1.015 | 0.848 | 0.953 | 0.00862 | extreme evidence against |
N7103 - NC Raleigh | 425045 | 2.192 | 1.904 | 2.527 | 2.046 | 2.352 | 5.75 × 108 | extreme evidence in favour of |
N7103 - (TCHM06-240) | 425045 | 1.070 | 0.946 | 1.207 | 1.007 | 1.136 | 0.00356 | extreme evidence against |
NC Dunphy - NC Raleigh | 425045 | 2.442 | 2.173 | 2.739 | 2.304 | 2.578 | 6.08 × 1013 | extreme evidence in favour of |
NC Dunphy - (TCHM06-240) | 425045 | 1.190 | 1.088 | 1.306 | 1.138 | 1.244 | 0.85 | anecdotal evidence against |
NC Raleigh - (TCHM06-240) | 425045 | 0.487 | 0.435 | 0.548 | 0.461 | 0.516 | 1.15 × 108 | extreme evidence in favour of |
max parent 1 | max parent 2 | estimate | 95% CI lower | 95% CI upper | 66% CI lower | 66% CI upper | Bayes factor | interpretation |
---|---|---|---|---|---|---|---|---|
N7103 | NC Dunphy | 0.802 | 0.744 | 0.868 | 0.773 | 0.833 | 86.1 | very strong evidence in favour of |
N7103 | NC Raleigh | 1.560 | 1.429 | 1.704 | 1.493 | 1.629 | 3.08 × 107 | extreme evidence in favour of |
N7103 | TCHM06-240 | 1.032 | 0.952 | 1.117 | 0.992 | 1.072 | 0.00228 | extreme evidence against |
NC Dunphy | NC Raleigh | 1.943 | 1.805 | 2.100 | 1.874 | 2.014 | 1.06 × 1020 | extreme evidence in favour of |
NC Dunphy | TCHM06-240 | 1.285 | 1.206 | 1.372 | 1.245 | 1.325 | 9.83 × 104 | extreme evidence in favour of |
NC Raleigh | TCHM06-240 | 0.661 | 0.613 | 0.714 | 0.637 | 0.687 | 2.77 × 108 | extreme evidence in favour of |
soja parent 1 | soja parent 2 | estimate | 95% CI lower | 95% CI upper | 66% CI lower | 66% CI upper | Bayes factor | interpretation |
---|---|---|---|---|---|---|---|---|
407191 | 425045 | 7.793 | 7.382 | 8.238 | 7.589 | 8.009 | 2.77 × 1073 | extreme evidence in favour of |
First the contrasts of pairs of max parents within soja parent are given. Then the contrasts of the averaged max and soja parents are given.
Quantiles and Bayes factors are included. Evidence “in favor of” means we think the difference in slopes is nonzero for the pair being compared, and evidence “against” means it is consistent with a difference of zero between the slopes of the pair being compared.
max parents contrasted | soja parent | estimate | 95% CI lower | 95% CI upper | 66% CI lower | 66% CI upper | Bayes factor | interpretation |
---|---|---|---|---|---|---|---|---|
N7103 - NC Dunphy | 407191 | 0.079 | 0.012 | 0.146 | 0.047 | 0.110 | 0.174 | moderate evidence against |
N7103 - NC Raleigh | 407191 | -0.032 | -0.105 | 0.039 | -0.068 | 0.002 | 0.0165 | very strong evidence against |
N7103 - (TCHM06-240) | 407191 | -0.183 | -0.257 | -0.110 | -0.218 | -0.148 | 128 | extreme evidence in favour of |
NC Dunphy - NC Raleigh | 407191 | -0.111 | -0.169 | -0.053 | -0.139 | -0.083 | 3.52 | moderate evidence in favour of |
NC Dunphy - (TCHM06-240) | 407191 | -0.261 | -0.321 | -0.201 | -0.291 | -0.233 | 1.41 × 107 | extreme evidence in favour of |
NC Raleigh - (TCHM06-240) | 407191 | -0.151 | -0.216 | -0.085 | -0.183 | -0.118 | 15.8 | strong evidence in favour of |
N7103 - NC Dunphy | 425045 | -0.167 | -0.250 | -0.087 | -0.206 | -0.127 | 13.4 | strong evidence in favour of |
N7103 - NC Raleigh | 425045 | 0.243 | 0.146 | 0.341 | 0.196 | 0.289 | 48.1 | very strong evidence in favour of |
N7103 - (TCHM06-240) | 425045 | -0.250 | -0.333 | -0.165 | -0.290 | -0.207 | 921 | extreme evidence in favour of |
NC Dunphy - NC Raleigh | 425045 | 0.410 | 0.340 | 0.480 | 0.376 | 0.443 | 1.49 × 109 | extreme evidence in favour of |
NC Dunphy - (TCHM06-240) | 425045 | -0.082 | -0.135 | -0.030 | -0.108 | -0.056 | 0.381 | anecdotal evidence against |
NC Raleigh - (TCHM06-240) | 425045 | -0.492 | -0.568 | -0.416 | -0.528 | -0.455 | 5.19 × 1010 | extreme evidence in favour of |
max parent 1 | max parent 2 | estimate | 95% CI lower | 95% CI upper | 66% CI lower | 66% CI upper | Bayes factor | interpretation |
---|---|---|---|---|---|---|---|---|
N7103 | NC Dunphy | -0.221 | -0.296 | -0.142 | -0.257 | -0.183 | 0.027 | very strong evidence against |
N7103 | NC Raleigh | 0.444 | 0.357 | 0.533 | 0.401 | 0.488 | 7.02 | moderate evidence in favour of |
N7103 | TCHM06-240 | 0.031 | -0.049 | 0.110 | -0.008 | 0.070 | 8780 | extreme evidence in favour of |
NC Dunphy | NC Raleigh | 0.664 | 0.590 | 0.742 | 0.628 | 0.700 | 507 | extreme evidence in favour of |
NC Dunphy | TCHM06-240 | 0.251 | 0.187 | 0.316 | 0.220 | 0.282 | 4.51 × 106 | extreme evidence in favour of |
NC Raleigh | TCHM06-240 | -0.414 | -0.489 | -0.337 | -0.452 | -0.375 | 9.98 × 109 | extreme evidence in favour of |
soja parent 1 | soja parent 2 | estimate | 95% CI lower | 95% CI upper | 66% CI lower | 66% CI upper | Bayes factor | interpretation |
---|---|---|---|---|---|---|---|---|
407191 | 425045 | 2.053 | 1.999 | 2.109 | 2.027 | 2.081 | 3.18 × 107 | extreme evidence in favour of |
Here is a plot that has both the fitted values and data values (average 100-seed weight) for each cross showing the selectability vs. seed weight trends. This is an alternative to Fig 5 in the MS version from early March.
pred_grid <- CJ(max = unique(dat$max), soja = unique(dat$soja), ave_100sw = seq(3.2, 9.2, 0.1), n_planted_trials = 1000)
fitted_vals <- fitted(fit_binom_seedwt_full, newdata = pred_grid, re_formula = ~ 0, summary = FALSE)
fitted_quant <- cbind(pred_grid, t(fitted_vals))
fitted_quant <- melt(fitted_quant, id.vars = 1:4)
fitted_quant[, value := value/n_planted_trials]
fitted_quant <- fitted_quant[, as.data.frame(t(quantile(value, probs = q_probs))), by = .(max, soja, ave_100sw)]
setnames(fitted_quant, c('max','soja','ave_100sw', q_cols))
ggplot(dat, aes(x = ave_100sw)) +
geom_point(aes(y = n_selected/n_planted)) +
geom_ribbon(data = fitted_quant, aes(ymin = q0.17, ymax = q0.83), alpha = 0.3) +
geom_line(data = fitted_quant, aes(y = q0.5)) +
geom_line(data = fitted_quant, aes(y = q0.025), linetype = 'dotted') +
geom_line(data = fitted_quant, aes(y = q0.975), linetype = 'dotted') +
facet_grid(max ~ soja) +
pct_scale +
labs(x = '100-seed weight') +
theme(legend.position = 'bottom',
axis.title.x = element_markdown(),
strip.text = element_markdown(),
legend.text = element_markdown())
#ggsave('selectabilitybysize_modelsanddata.png', dpi = 400, height = 8, width = 5)
Load additional packages and data.
library(tidyverse)
library(gt)
dat <- read_csv('data/seedcolor.csv')
Plot the proportions of each seed color for the F2 derived and selected populations of each cross.
The differences are not dramatic but there may be some. The selected populations have somewhat more light colored seeds than the F2 derived populations, but the NC-Raleigh x PI424045 selected population also has more black colored seeds than its F2 derived population.
This is a multinomial regression with main effect of max parent in the cross, population (F2 derived vs. selected), and their interaction. There are no random effects because all nursery locations are summed together. The response variables are the color counts.
fit_seedcolor_f2vsselected <- brm(
cbind(light, mid, brown, dark) | trials(n_total) ~ max * population,
data = dat, family = multinomial(link = 'logit', refcat = 'light'),
prior = c(
prior('normal(0, 2)', class = 'b')
),
iter = 3000, warmup = 2000, chains = 4, seed = 2024,
file = 'data/fit_seedcolor_f2vsselected'
)
Get the posterior fitted value of the proportion (population-level prediction) for each of the populations for each cross from the model, and their uncertainty intervals. Take the contrasts as the ratio of posterior probabilities within each color between the selected and the F2-derived populations.
pred_data <- dat %>%
select(population, max) %>%
mutate(n_total = 1) %>%
add_epred_draws(fit_seedcolor_f2vsselected) %>%
mutate(.category = fct_recode(.category, `mid-green` = 'mid', black = 'dark'))
pred_data_wide <- pred_data %>%
ungroup %>% select(-.row) %>%
pivot_wider(names_from = population, values_from = .epred) %>%
mutate(ratio = selected/`F2 derived`)
Get quantiles of the predicted values and contrasts.
pred_quants <- pred_data_wide %>%
group_by(max, .category) %>%
median_qi(`F2 derived`, selected, ratio, .width = c(.66, .95))
Here are the median posterior probabilities of each color, with thick error bars for the 66% quantile credible interval, and thin error bars for the 95% interval. These are plotted to compare the selected population with the F2 derived within each cross.
Judging from the plot, we can see there are some differences. The selected population has a greater proportion of light-colored seed (posterior proportion is greater than 30%) than the F2 derived population (around 20% to 25%). This is compensated for by the selected population having less mid-colored seed. As we saw in the bar plot above, the NC-Raleigh selected population also has more black-colored seed. Note: The F2 derived population estimates have much higher uncertainty because they are taken from a much smaller sample size. The selected population counts were obtained by simply adding up the counts from all the replicates and ignoring the F2 and F3 nursery location variation, so there is a much larger sample size.
To better quantify the evidence for difference in color between F2 derived and selected, these are the posterior values for the ratio between F2 derived and selected, and their 66% and 95% quantile credible intervals. A value > 1 indicates higher proportion of a color in the selected population.
We have solid evidence of differences in the relative probabilities. In both crosses, the F2 derived has a lower probability of light-colored seed and a higher probability of mid-colored seed. The F2 derived population has a lower probability of black-colored seed only in the NC-Raleigh x PI424045 cross.
This table contains all the data presented in the above two figures (posterior medians of fitted probabilities, their ratios F2 derived/selected, and 66% and 95% quantile credible intervals for every value).
color | population | estimate | 66% QI | 95% QI |
---|---|---|---|---|
NC-Raleigh x PI424045 | ||||
light | F2 derived | 0.251 | (0.214, 0.29) | (0.178, 0.337) |
light | selected | 0.340 | (0.319, 0.362) | (0.298, 0.384) |
light | ratio | 1.350 | (1.16, 1.61) | (0.985, 1.96) |
mid-green | F2 derived | 0.301 | (0.26, 0.345) | (0.223, 0.395) |
mid-green | selected | 0.165 | (0.148, 0.181) | (0.133, 0.2) |
mid-green | ratio | 0.547 | (0.46, 0.649) | (0.386, 0.787) |
brown | F2 derived | 0.182 | (0.15, 0.219) | (0.118, 0.263) |
brown | selected | 0.132 | (0.118, 0.148) | (0.104, 0.165) |
brown | ratio | 0.729 | (0.585, 0.911) | (0.464, 1.17) |
black | F2 derived | 0.259 | (0.219, 0.301) | (0.183, 0.342) |
black | selected | 0.361 | (0.34, 0.383) | (0.32, 0.407) |
black | ratio | 1.390 | (1.19, 1.65) | (1.03, 2.02) |
TCHM06-240 x PI424045 | ||||
light | F2 derived | 0.212 | (0.169, 0.259) | (0.128, 0.316) |
light | selected | 0.347 | (0.334, 0.36) | (0.321, 0.374) |
light | ratio | 1.640 | (1.33, 2.06) | (1.1, 2.71) |
mid-green | F2 derived | 0.331 | (0.28, 0.387) | (0.227, 0.451) |
mid-green | selected | 0.226 | (0.215, 0.238) | (0.204, 0.25) |
mid-green | ratio | 0.683 | (0.58, 0.814) | (0.497, 1) |
brown | F2 derived | 0.223 | (0.178, 0.274) | (0.137, 0.33) |
brown | selected | 0.219 | (0.208, 0.231) | (0.195, 0.243) |
brown | ratio | 0.978 | (0.795, 1.24) | (0.649, 1.61) |
black | F2 derived | 0.224 | (0.177, 0.275) | (0.136, 0.329) |
black | selected | 0.207 | (0.196, 0.219) | (0.184, 0.231) |
black | ratio | 0.922 | (0.748, 1.18) | (0.615, 1.54) |