Summary

This document is supplementary information for the manuscript:

Taliercio, E., D. Eickholt, Q. D. Read, T. Carter, N. Waldeck, and B. Fallen. 2023. Parental Choice and Intrapopulation Selection for Seed Size Impacts Uprightness of Progeny Derived from Interspecific Hybridization between Glycine max and Glycine soja.

This includes all code required to reproduce the analysis described as “Experiment 3” in the manuscript.

Change log

Setup

Load the necessary packages.

library(data.table)

library(ggplot2)

library(tidyr)

library(brms)

library(emmeans)

library(tidybayes)

library(ggtext)

library(bayestestR)

library(effectsize)

library(Rutilitybelt) # This package may be installed by calling remotes::install_github('qdread/Rutilitybelt')



options(mc.cores = 4, brms.backend = 'cmdstanr', brms.file_refit = 'never')

Load data and set factor levels.

dat <- fread('data/counts_seedwt.csv')



dat[, size_ordered := ordered(size, levels = c('< 8 3/4', '> 8 3/4', '> 10', '> 10 3/4', '> 11 1/2', '> 12'))]

dat[, size_combined := factor(size_combined, levels = c('< 8 3/4', '> 8 3/4', '> 10 3/4', '> 11 1/2', '> 12'))]

dat[, soja := factor(soja)]

Exploratory data viz

Make plots showing the values of number selected per number planted for the different combinations of predictors. Logit transform on y axis.

pct_scale <- scale_y_continuous(trans = 'logit', name = 'percent selected', breaks = c(0.001, 0.003, 0.01, 0.03, 0.10), labels = scales::percent_format(accuracy = 0.1))

soja_facet <- facet_wrap(~ soja, labeller = as_labeller(function(x) paste0('*G. soja* parent: PI', x)))



ggplot(dat |> transform(F3_location = ifelse(dat$F3_location == 'CLA', 'CCRS', 'UCPRS'),

                        F2_location = ifelse(dat$F2_location == 'Caswell', 'CRS', 'LCPRS'),

                        max = gsub('NC ', 'NC-', max)), 

       aes(x = max, y = n_selected/n_planted, fill = interaction(F2_location, F3_location, sep = ' &times; '))) +

  geom_boxplot(position = 'dodge') +

  soja_facet +

  labs(x = '*G. max* parent', fill = 'F<sub>2</sub> location &times; F<sub>3</sub> location') +

  pct_scale +

  theme(legend.position = 'bottom',

        axis.title.x = element_markdown(),

        legend.title = element_markdown(),

        legend.text = element_markdown(),

        strip.text = element_markdown()) 

#ggsave('C:/Users/qdread/onedrive_usda/ars_projects/taliercio/figs/FIG09_selectabilitybynursery.png', dpi = 400, height = 4, width = 7.5)

Also look at seed size.

ggplot(dat |> transform(max = gsub('NC ', 'NC-', max)), aes(x = max, y = n_selected/n_planted, fill = size_ordered)) +

  geom_boxplot(position = 'dodge') +

  soja_facet + pct_scale +

  scale_fill_viridis_d(name = 'seed size class', 

                       labels = c('&#60; 8 &frac34;', '&#62; 8 &frac34;', '&#62; 10', '&#62; 10 &frac34;', '&#62; 11 &frac12;', '&#62; 12')) +

  labs(x = '*G. max* parent') +

  theme(legend.position = 'bottom',

        axis.title.x = element_markdown(),

        strip.text = element_markdown(),

        legend.text = element_markdown()) 

#ggsave('C:/Users/qdread/onedrive_usda/ars_projects/taliercio/figs/FIG10_selectabilitybysizeclass.png', dpi = 400, height = 4, width = 7.5)

Fitting model

The statstical model should answer the following research questions:

Based on this we will have a 3-way interaction between G. max parent, G. soja parent, and seed size as the fixed effects, and F2 nursery location and F3 nursery location as crossed random effects (random intercepts).

The other consideration is that the response variable is number selected per number planted. This is a binomial trials situation, where we are modeling how the probability of being selected varies with our different fixed and random effects. Each seed planted is a binary trial that has a certain probability of being selected (outcome = 1) or not selected (outcome = 0).

All of this can be combined into a single Bayesian binomial mixed-effects regression model that will enable us to answer all the questions that the design is set up to answer, while properly controlling for any random effects that might arise from the different nurseries.

In the following model, I am using the continuous variable for 100-seed weight to represent seed size. That makes the model a lot easier to deal with than if we used the categories. The formula includes the full three-way interaction and the random intercept terms. It is a binomial model with a logit link function, meaning we are comparing how the log odds changes among the different groups. Some priors are added to limit the coefficients to a reasonable range centered around zero which makes the model converge better.

fit_binom_seedwt_full <- brm(

  n_selected | trials(n_planted_trials) ~ max * soja * ave_100sw + (1|F2_location) + (1|F3_location),

  data = dat, family = binomial,

  prior = c(

    prior('student_t(3, 0, 3)', class = 'b')

  ),

  iter = 6000, warmup = 4000, chains = 4, seed = 910, 

  file = 'data/fit_binomial_seedwt_full'

)

First, let’s look at the model diagnostics to make sure it converged. All Rhat statistics (potential scale reduction factor which should be less than 1.05 for all parameters) are close to 1 which means it converged.

summary(fit_binom_seedwt_full)
##  Family: binomial 

##   Links: mu = logit 

## Formula: n_selected | trials(n_planted_trials) ~ max * soja * ave_100sw + (1 | F2_location) + (1 | F3_location) 

##    Data: dat (Number of observations: 160) 

##   Draws: 4 chains, each with iter = 6000; warmup = 4000; thin = 1;

##          total post-warmup draws = 8000

## 

## Group-Level Effects: 

## ~F2_location (Number of levels: 2) 

##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS

## sd(Intercept)     0.49      0.63     0.02     2.25 1.01     1205     1351

## 

## ~F3_location (Number of levels: 2) 

##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS

## sd(Intercept)     0.45      0.55     0.02     2.03 1.01     1152     2002

## 

## Population-Level Effects: 

##                                    Estimate Est.Error l-95% CI u-95% CI Rhat

## Intercept                             -3.54      0.51    -4.61    -2.40 1.01

## maxNCDunphy                            0.81      0.19     0.42     1.19 1.00

## maxNCRaleigh                          -0.30      0.22    -0.73     0.12 1.00

## maxTCHM06M240                         -1.11      0.22    -1.55    -0.66 1.00

## soja425045                            -2.98      0.25    -3.46    -2.50 1.00

## ave_100sw                             -0.04      0.03    -0.10     0.02 1.00

## maxNCDunphy:soja425045                -1.72      0.30    -2.31    -1.13 1.00

## maxNCRaleigh:soja425045                0.99      0.34     0.32     1.65 1.00

## maxTCHM06M240:soja425045              -0.47      0.33    -1.11     0.17 1.00

## maxNCDunphy:ave_100sw                 -0.08      0.03    -0.15    -0.01 1.00

## maxNCRaleigh:ave_100sw                 0.03      0.04    -0.04     0.11 1.00

## maxTCHM06M240:ave_100sw                0.18      0.04     0.11     0.26 1.00

## soja425045:ave_100sw                   0.19      0.05     0.10     0.28 1.00

## maxNCDunphy:soja425045:ave_100sw       0.25      0.05     0.14     0.35 1.00

## maxNCRaleigh:soja425045:ave_100sw     -0.28      0.06    -0.39    -0.16 1.00

## maxTCHM06M240:soja425045:ave_100sw     0.07      0.06    -0.04     0.18 1.00

##                                    Bulk_ESS Tail_ESS

## Intercept                              1759     1565

## maxNCDunphy                            1764     2740

## maxNCRaleigh                           1922     3004

## maxTCHM06M240                          1841     2869

## soja425045                             1403     2591

## ave_100sw                              1525     2716

## maxNCDunphy:soja425045                 1456     2671

## maxNCRaleigh:soja425045                1718     3017

## maxTCHM06M240:soja425045               1741     3114

## maxNCDunphy:ave_100sw                  1629     2668

## maxNCRaleigh:ave_100sw                 1769     2819

## maxTCHM06M240:ave_100sw                1693     2798

## soja425045:ave_100sw                   1338     2638

## maxNCDunphy:soja425045:ave_100sw       1296     2477

## maxNCRaleigh:soja425045:ave_100sw      1616     2847

## maxTCHM06M240:soja425045:ave_100sw     1586     2706

## 

## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS

## and Tail_ESS are effective sample size measures, and Rhat is the potential

## scale reduction factor on split chains (at convergence, Rhat = 1).

Graphs of model results

Now let’s look at the random effect estimates. On this plot and the following, the point is the median, the thick bar is the 66% quantile credible interval (about 1 SD), and the thin bar is the 95% quantile credible interval (about 2 SD).

f2_effect <- fit_binom_seedwt_full %>%

  spread_draws(r_F2_location[F2_location,term])

f3_effect <- fit_binom_seedwt_full %>%

  spread_draws(r_F3_location[F3_location,term])



ranef_both <- rbindlist(list(f2_effect,f3_effect), use.names = FALSE)

ranef_both[, location := factor(F2_location, levels = rev(c('Caswell','Hugo','FF','CLA')), labels = rev(c('CRS (F<sub>2</sub>)','LCPRS (F<sub>2</sub>)','UCPRS (F<sub>3</sub>)','CCRS (F<sub>3</sub>)')))]



ggplot(ranef_both,

       aes(y = location, x = r_F2_location)) +

  stat_sample_slabinterval(geom = 'pointinterval', point_interval = median_qi) +

  geom_vline(xintercept = 0, color = "red", lty = 'dashed') +

  scale_x_continuous(limits = c(-0.5, 0.5)) +

  labs(y = 'nursery', x = 'effect') +

  theme(axis.text.y = element_markdown())

#ggsave('C:/Users/qdread/onedrive_usda/ars_projects/taliercio/figs/randomeffects.png', dpi = 400, height = 4, width = 5)

Next, plot the estimated marginal means for the different crosses. Here we see that the probability of selecting the progeny is uniformly higher for soja parent PI407191. The NC-Raleigh max parent may be a little worse and the NC-Dunphy max parent a little better, but this is a relatively weak effect and does not depend on the soja parent.

emm_max_soja <- emmeans(fit_binom_seedwt_full, ~ max + soja, type = "response")



max_soja_toplot <- gather_emmeans_draws(emm_max_soja) |> setDT()

max_soja_toplot[, p := plogis(.value)]



ggplot(max_soja_toplot, aes(y = max, x = p)) +

  stat_sample_slabinterval(geom = 'pointinterval', point_interval = median_qi) +

  facet_wrap(~ soja, nrow = 2, labeller = labeller(soja = function(x) paste('*G. soja* parent:', x))) +

  scale_x_continuous(name = 'probability of selection', trans = 'logit', breaks = c(.001, .002, .005, .01, .02, .05, .10), expand = expansion(mult = c(0, 0)), limits = c(0.0005, 0.11), labels = scales::percent_format(accuracy = 0.1)) +

  labs(y = '*G. max* parent') +

  theme(axis.title.y = element_markdown(), strip.text = element_markdown())

#ggsave('marginalmeans_crosses.png', dpi = 400, height = 7, width = 5)

These are the estimated marginal trends (slopes of the selectability vs. seed weight line) for the different crosses, as well as for the individual max and soja parents averaged across the other parent. Most of them are positive as expected (increasing seed weight leads to higher percent progeny selected) but not universally. Some are neutral or even negative. The trend noticed in the exploratory plots, that there is a positive trend for selectability with seed weight in PI425045 but not PI407191, is supported by the model.

swt_trends_max_soja <- emtrends(fit_binom_seedwt_full, ~ max + soja, var = 'ave_100sw', type = 'response')



swt_trends_toplot <- gather_emmeans_draws(swt_trends_max_soja) |> setDT()



ggplot(swt_trends_toplot, aes(y = max, x = .value)) +

  stat_sample_slabinterval(geom = 'pointinterval', point_interval = median_qi) +

  geom_vline(xintercept = 0, color = 'red', linetype = 'dashed') +

  facet_wrap(~ soja, nrow = 2, labeller = labeller(soja = function(x) paste('*G. soja* parent:', x))) +

  scale_x_continuous(name = 'slope of selectability vs. seed weight trend') +

  labs(y = '*G. max* parent') +

  theme(axis.title.y = element_markdown(), strip.text = element_markdown())

#ggsave('seedweight_slopes_crosses.png', dpi = 400, height = 7, width = 5)



swt_trends_max <- emtrends(fit_binom_seedwt_full, ~ max, var = 'ave_100sw', type = 'response')



swt_trends_max_toplot <- gather_emmeans_draws(swt_trends_max) |> setDT()



ggplot(swt_trends_max_toplot, aes(y = max, x = .value)) +

  stat_sample_slabinterval(geom = 'pointinterval', point_interval = median_qi) +

  geom_vline(xintercept = 0, color = 'red', linetype = 'dashed') +

  scale_x_continuous(name = 'slope of selectability vs. seed weight trend') +

  labs(y = '*G. max* parent') +

  theme(axis.title.y = element_markdown(), strip.text = element_markdown())

#ggsave('seedweight_slopes_max.png', dpi = 400, height = 4, width = 5)



swt_trends_soja <- emtrends(fit_binom_seedwt_full, ~ soja, var = 'ave_100sw', type = 'response')



swt_trends_soja_toplot <- gather_emmeans_draws(swt_trends_soja) |> setDT()



ggplot(swt_trends_soja_toplot, aes(y = soja, x = .value)) +

  stat_sample_slabinterval(geom = 'pointinterval', point_interval = median_qi) +

  geom_vline(xintercept = 0, color = 'red', linetype = 'dashed') +

  scale_x_continuous(name = 'slope of selectability vs. seed weight trend') +

  labs(y = '*G. soja* parent') +

  theme(axis.title.y = element_markdown(), strip.text = element_markdown())

#ggsave('seedweight_slopes_soja.png', dpi = 400, height = 4, width = 5)

Post hoc contrasts

Contrasts between pairs of marginal means allow us to make quantifiable statements about differences between the means in the manuscript. These contrasts are odds ratios (ratio of the selection odds between the two groups being compared). The Bayesian model does not require any correction for multiple comparisons. An odds ratio of 1, or log odds ratio of 0, means no difference between the two groups, i.e. an identical selection probability in both.

emm_max_soja_raw <- emmeans(fit_binom_seedwt_full, ~ max | soja)

contr_max_soja <- contrast(emm_max_soja_raw, 'pairwise')



emm_max_raw <- emmeans(fit_binom_seedwt_full, ~ max)

contr_max <- contrast(emm_max_raw, 'pairwise')



emm_soja_raw <- emmeans(fit_binom_seedwt_full, ~ soja)

contr_soja <- contrast(emm_soja_raw, 'pairwise')

Post hoc contrasts for the slope trends.

swt_trends_max_bysoja <- emtrends(fit_binom_seedwt_full, ~ max | soja, var = 'ave_100sw', type = 'response')



swt_contr_max_soja <- contrast(swt_trends_max_bysoja, 'pairwise')

swt_contr_max <- contrast(swt_trends_max, 'pairwise')

swt_contr_soja <- contrast(swt_trends_soja, 'pairwise')

Bayes factors

To assessing the strength of evidence for the different effects and comparisons in our model we calculate Bayes factors (BF). Our prior estimate for the difference between the treatments was centered around zero, so the prior essentially represents a null hypothesis. If BF = 1, we have not changed our belief about the parameter at all. The prior estimate (of zero) is just as likely to be true as before. If BF > 1, we have evidence for our new estimate being true. The higher the BF, the more evidence. But if BF < 1, we have more evidence of the prior (zero) estimate being true than we did before we fit the model.

Bayes factors for the pairwise contrasts between the selectability estimates for the different crosses. This requires finding versions of the estimated marginal means and contrasts using the prior model.

fit_prior <- unupdate(fit_binom_seedwt_full)



contr_max_prior <- emmeans(fit_prior, ~ max) |> contrast('pairwise')

contr_soja_prior <- emmeans(fit_prior, ~ soja) |> contrast('pairwise')

contr_max_soja_prior <- emmeans(fit_prior, ~ max | soja) |> contrast('pairwise')



bf_contr_max <- bayesfactor(contr_max, prior = contr_max_prior)

bf_contr_soja <- bayesfactor(contr_soja, prior = contr_soja_prior)

bf_contr_max_soja <- bayesfactor(contr_max_soja, prior = contr_max_soja_prior)



interpbf_contr_max <- interpret_bf(bf_contr_max$log_BF, log = TRUE)

interpbf_contr_soja <- interpret_bf(bf_contr_soja$log_BF, log = TRUE)

interpbf_contr_max_soja <- interpret_bf(bf_contr_max_soja$log_BF, log = TRUE)

Bayes factors for the seed weight slope trends.

swt_trends_max_soja_prior <- emtrends(fit_prior, ~ max | soja, var = 'ave_100sw', type = 'response')

swt_trends_max_prior <- emtrends(fit_prior, ~ max, var = 'ave_100sw', type = 'response')

swt_trends_soja_prior <- emtrends(fit_prior, ~ soja, var = 'ave_100sw', type = 'response')



bf_swt_max_soja <- bayesfactor(swt_trends_max_soja, prior = swt_trends_max_soja_prior)

bf_swt_max <- bayesfactor(swt_trends_max, prior = swt_trends_max_prior)

bf_swt_soja <- bayesfactor(swt_trends_soja, prior = swt_trends_soja_prior)



interpbf_swt_max_soja <- interpret_bf(bf_swt_max_soja$log_BF, log = TRUE)

interpbf_swt_max <- interpret_bf(bf_swt_max$log_BF, log = TRUE)

interpbf_swt_soja <- interpret_bf(bf_swt_soja$log_BF, log = TRUE)

Bayes factors for the pairwise contrasts between the seed weight slopes.

swt_contr_max_soja_prior <- contrast(swt_trends_max_soja_prior, 'pairwise')

swt_contr_max_prior <- contrast(swt_trends_max_prior, 'pairwise')

swt_contr_soja_prior <- contrast(swt_trends_soja_prior, 'pairwise')



bf_swt_contr_max_soja <- bayesfactor(swt_contr_max_soja, prior = swt_contr_max_soja_prior)

bf_swt_contr_max <- bayesfactor(swt_contr_max, prior = swt_contr_max_prior)

bf_swt_contr_soja <- bayesfactor(swt_contr_soja, prior = swt_contr_soja_prior)



interpbf_swt_contr_max_soja <- interpret_bf(bf_swt_contr_max_soja$log_BF, log = TRUE)

interpbf_swt_contr_max <- interpret_bf(bf_swt_contr_max$log_BF, log = TRUE)

interpbf_swt_contr_soja <- interpret_bf(bf_swt_contr_soja$log_BF, log = TRUE)

The BFs are displayed along with the contrasts and their quantiles in tables below.

Results in tabular form

Here are some tables that reproduce some of the information from the graphs above. In all cases, I give parameter estimate (median) and lower and upper bounds of the 66% and 95% credible intervals. Everything is given to 3 significant figures.

Random effects

Random effects
location estimate 95% CI lower 95% CI upper 66% CI lower 66% CI upper
CRS (F2) 0.01860 -0.837 0.712 -0.224 0.169
LCPRS (F2) -0.03850 -0.909 0.636 -0.290 0.103
CCRS (F3) 0.00943 -0.855 0.676 -0.210 0.155
UCPRS (F3) -0.03290 -0.909 0.624 -0.265 0.100

Marginal means of crosses

Marginal means of selection probability for each cross
max parent soja parent estimate 95% CI lower 95% CI upper 66% CI lower 66% CI upper
N7103 407191 0.02140 0.008230 0.06780 0.01600 0.03330
NC Dunphy 407191 0.02980 0.011500 0.09550 0.02230 0.04660
NC Raleigh 407191 0.01930 0.007480 0.06200 0.01440 0.03020
TCHM06-240 407191 0.02140 0.008290 0.06830 0.01600 0.03350
N7103 425045 0.00351 0.001350 0.01130 0.00262 0.00550
NC Dunphy 425045 0.00390 0.001510 0.01240 0.00292 0.00610
NC Raleigh 425045 0.00160 0.000619 0.00506 0.00119 0.00249
TCHM06-240 425045 0.00328 0.001270 0.01040 0.00245 0.00514

Marginal means averaged by parent type

Marginal means of selection probability for each G. max parent
max parent estimate 95% CI lower 95% CI upper 66% CI lower 66% CI upper
N7103 0.00865 0.00335 0.0279 0.00648 0.01350
NC Dunphy 0.01080 0.00417 0.0344 0.00808 0.01690
NC Raleigh 0.00555 0.00216 0.0177 0.00415 0.00869
TCHM06-240 0.00837 0.00325 0.0267 0.00628 0.01310
Marginal means of selection probability for each G. soja parent
soja parent estimate 95% CI lower 95% CI upper 66% CI lower 66% CI upper
407191 0.0226 0.00878 0.07250 0.01700 0.03540
425045 0.0029 0.00112 0.00925 0.00218 0.00455

Post-hoc contrasts: odds ratios of selection probability between crosses

First the contrasts of pairs of max parents within soja parent are given. Then the contrasts of the averaged max and soja parents are given.

Quantiles and Bayes factors are included. Verbal interpretation of Bayes factor is provided according to the cutoffs in Jeffreys 1961. Evidence “in favor of” means we think the probability of selection is different for the pair being compared, and evidence “against” means we think it’s probably the same.

Odds ratios between crosses
max parents contrasted soja parent estimate 95% CI lower 95% CI upper 66% CI lower 66% CI upper Bayes factor interpretation
N7103 - NC Dunphy 407191 0.716 0.647 0.790 0.683 0.751 3080 extreme evidence in favour of
N7103 - NC Raleigh 407191 1.109 0.998 1.235 1.053 1.167 0.0154 very strong evidence against
N7103 - (TCHM06-240) 407191 0.995 0.894 1.104 0.945 1.046 0.00288 extreme evidence against
NC Dunphy - NC Raleigh 407191 1.549 1.410 1.697 1.480 1.620 1.68 × 106 extreme evidence in favour of
NC Dunphy - (TCHM06-240) 407191 1.388 1.271 1.517 1.328 1.451 6.26 × 104 extreme evidence in favour of
NC Raleigh - (TCHM06-240) 407191 0.896 0.814 0.989 0.854 0.941 0.0171 very strong evidence against
N7103 - NC Dunphy 425045 0.899 0.796 1.015 0.848 0.953 0.00862 extreme evidence against
N7103 - NC Raleigh 425045 2.192 1.904 2.527 2.046 2.352 5.75 × 108 extreme evidence in favour of
N7103 - (TCHM06-240) 425045 1.070 0.946 1.207 1.007 1.136 0.00356 extreme evidence against
NC Dunphy - NC Raleigh 425045 2.442 2.173 2.739 2.304 2.578 6.08 × 1013 extreme evidence in favour of
NC Dunphy - (TCHM06-240) 425045 1.190 1.088 1.306 1.138 1.244 0.85 anecdotal evidence against
NC Raleigh - (TCHM06-240) 425045 0.487 0.435 0.548 0.461 0.516 1.15 × 108 extreme evidence in favour of
Odds ratios between pairs of G. max parents
max parent 1 max parent 2 estimate 95% CI lower 95% CI upper 66% CI lower 66% CI upper Bayes factor interpretation
N7103 NC Dunphy 0.802 0.744 0.868 0.773 0.833 86.1 very strong evidence in favour of
N7103 NC Raleigh 1.560 1.429 1.704 1.493 1.629 3.08 × 107 extreme evidence in favour of
N7103 TCHM06-240 1.032 0.952 1.117 0.992 1.072 0.00228 extreme evidence against
NC Dunphy NC Raleigh 1.943 1.805 2.100 1.874 2.014 1.06 × 1020 extreme evidence in favour of
NC Dunphy TCHM06-240 1.285 1.206 1.372 1.245 1.325 9.83 × 104 extreme evidence in favour of
NC Raleigh TCHM06-240 0.661 0.613 0.714 0.637 0.687 2.77 × 108 extreme evidence in favour of
Odds ratio between pair of G. soja parents
soja parent 1 soja parent 2 estimate 95% CI lower 95% CI upper 66% CI lower 66% CI upper Bayes factor interpretation
407191 425045 7.793 7.382 8.238 7.589 8.009 2.77 × 1073 extreme evidence in favour of

Post-hoc contrasts: difference in slopes of seed weight trends between crosses

First the contrasts of pairs of max parents within soja parent are given. Then the contrasts of the averaged max and soja parents are given.

Quantiles and Bayes factors are included. Evidence “in favor of” means we think the difference in slopes is nonzero for the pair being compared, and evidence “against” means it is consistent with a difference of zero between the slopes of the pair being compared.

Comparison of seed weight trend slopes between crosses
max parents contrasted soja parent estimate 95% CI lower 95% CI upper 66% CI lower 66% CI upper Bayes factor interpretation
N7103 - NC Dunphy 407191 0.079 0.012 0.146 0.047 0.110 0.174 moderate evidence against
N7103 - NC Raleigh 407191 -0.032 -0.105 0.039 -0.068 0.002 0.0165 very strong evidence against
N7103 - (TCHM06-240) 407191 -0.183 -0.257 -0.110 -0.218 -0.148 128 extreme evidence in favour of
NC Dunphy - NC Raleigh 407191 -0.111 -0.169 -0.053 -0.139 -0.083 3.52 moderate evidence in favour of
NC Dunphy - (TCHM06-240) 407191 -0.261 -0.321 -0.201 -0.291 -0.233 1.41 × 107 extreme evidence in favour of
NC Raleigh - (TCHM06-240) 407191 -0.151 -0.216 -0.085 -0.183 -0.118 15.8 strong evidence in favour of
N7103 - NC Dunphy 425045 -0.167 -0.250 -0.087 -0.206 -0.127 13.4 strong evidence in favour of
N7103 - NC Raleigh 425045 0.243 0.146 0.341 0.196 0.289 48.1 very strong evidence in favour of
N7103 - (TCHM06-240) 425045 -0.250 -0.333 -0.165 -0.290 -0.207 921 extreme evidence in favour of
NC Dunphy - NC Raleigh 425045 0.410 0.340 0.480 0.376 0.443 1.49 × 109 extreme evidence in favour of
NC Dunphy - (TCHM06-240) 425045 -0.082 -0.135 -0.030 -0.108 -0.056 0.381 anecdotal evidence against
NC Raleigh - (TCHM06-240) 425045 -0.492 -0.568 -0.416 -0.528 -0.455 5.19 × 1010 extreme evidence in favour of
Comparison of seed weight trend slopes between pairs of G. max parents
max parent 1 max parent 2 estimate 95% CI lower 95% CI upper 66% CI lower 66% CI upper Bayes factor interpretation
N7103 NC Dunphy -0.221 -0.296 -0.142 -0.257 -0.183 0.027 very strong evidence against
N7103 NC Raleigh 0.444 0.357 0.533 0.401 0.488 7.02 moderate evidence in favour of
N7103 TCHM06-240 0.031 -0.049 0.110 -0.008 0.070 8780 extreme evidence in favour of
NC Dunphy NC Raleigh 0.664 0.590 0.742 0.628 0.700 507 extreme evidence in favour of
NC Dunphy TCHM06-240 0.251 0.187 0.316 0.220 0.282 4.51 × 106 extreme evidence in favour of
NC Raleigh TCHM06-240 -0.414 -0.489 -0.337 -0.452 -0.375 9.98 × 109 extreme evidence in favour of
Comparison of seed weight trend slopes between pair of G. soja parents
soja parent 1 soja parent 2 estimate 95% CI lower 95% CI upper 66% CI lower 66% CI upper Bayes factor interpretation
407191 425045 2.053 1.999 2.109 2.027 2.081 3.18 × 107 extreme evidence in favour of

Seed color analysis

Setup

Load additional packages and data.

library(tidyverse)

library(gt)



dat <- read_csv('data/seedcolor.csv')

Exploratory plot

Plot the proportions of each seed color for the F2 derived and selected populations of each cross.

The differences are not dramatic but there may be some. The selected populations have somewhat more light colored seeds than the F2 derived populations, but the NC-Raleigh x PI424045 selected population also has more black colored seeds than its F2 derived population.

Statistical model

This is a multinomial regression with main effect of max parent in the cross, population (F2 derived vs. selected), and their interaction. There are no random effects because all nursery locations are summed together. The response variables are the color counts.

fit_seedcolor_f2vsselected <- brm(

  cbind(light, mid, brown, dark) | trials(n_total) ~ max * population,

  data = dat, family = multinomial(link = 'logit', refcat = 'light'),

  prior = c(

    prior('normal(0, 2)', class = 'b')

  ),

  iter = 3000, warmup = 2000, chains = 4, seed = 2024, 

  file = 'data/fit_seedcolor_f2vsselected'

)

Generate predictions and contrasts from fitted model

Get the posterior fitted value of the proportion (population-level prediction) for each of the populations for each cross from the model, and their uncertainty intervals. Take the contrasts as the ratio of posterior probabilities within each color between the selected and the F2-derived populations.

pred_data <- dat %>%

  select(population, max) %>%

  mutate(n_total = 1) %>%

  add_epred_draws(fit_seedcolor_f2vsselected) %>%

  mutate(.category = fct_recode(.category, `mid-green` = 'mid', black = 'dark'))



pred_data_wide <- pred_data %>%

  ungroup %>% select(-.row) %>%

  pivot_wider(names_from = population, values_from = .epred) %>%

  mutate(ratio = selected/`F2 derived`)

Get quantiles of the predicted values and contrasts.

pred_quants <- pred_data_wide %>% 

  group_by(max, .category) %>%

  median_qi(`F2 derived`, selected, ratio, .width = c(.66, .95))

Results

Graph of the estimated mean probabilities of each color

Here are the median posterior probabilities of each color, with thick error bars for the 66% quantile credible interval, and thin error bars for the 95% interval. These are plotted to compare the selected population with the F2 derived within each cross.

Judging from the plot, we can see there are some differences. The selected population has a greater proportion of light-colored seed (posterior proportion is greater than 30%) than the F2 derived population (around 20% to 25%). This is compensated for by the selected population having less mid-colored seed. As we saw in the bar plot above, the NC-Raleigh selected population also has more black-colored seed. Note: The F2 derived population estimates have much higher uncertainty because they are taken from a much smaller sample size. The selected population counts were obtained by simply adding up the counts from all the replicates and ignoring the F2 and F3 nursery location variation, so there is a much larger sample size.

Graph of the ratio of probabilities of each color: F2 derived vs. selected

To better quantify the evidence for difference in color between F2 derived and selected, these are the posterior values for the ratio between F2 derived and selected, and their 66% and 95% quantile credible intervals. A value > 1 indicates higher proportion of a color in the selected population.

We have solid evidence of differences in the relative probabilities. In both crosses, the F2 derived has a lower probability of light-colored seed and a higher probability of mid-colored seed. The F2 derived population has a lower probability of black-colored seed only in the NC-Raleigh x PI424045 cross.

Table of all seed color results

This table contains all the data presented in the above two figures (posterior medians of fitted probabilities, their ratios F2 derived/selected, and 66% and 95% quantile credible intervals for every value).

color population estimate 66% QI 95% QI
NC-Raleigh x PI424045
light F2 derived 0.251 (0.214, 0.29) (0.178, 0.337)
light selected 0.340 (0.319, 0.362) (0.298, 0.384)
light ratio 1.350 (1.16, 1.61) (0.985, 1.96)
mid-green F2 derived 0.301 (0.26, 0.345) (0.223, 0.395)
mid-green selected 0.165 (0.148, 0.181) (0.133, 0.2)
mid-green ratio 0.547 (0.46, 0.649) (0.386, 0.787)
brown F2 derived 0.182 (0.15, 0.219) (0.118, 0.263)
brown selected 0.132 (0.118, 0.148) (0.104, 0.165)
brown ratio 0.729 (0.585, 0.911) (0.464, 1.17)
black F2 derived 0.259 (0.219, 0.301) (0.183, 0.342)
black selected 0.361 (0.34, 0.383) (0.32, 0.407)
black ratio 1.390 (1.19, 1.65) (1.03, 2.02)
TCHM06-240 x PI424045
light F2 derived 0.212 (0.169, 0.259) (0.128, 0.316)
light selected 0.347 (0.334, 0.36) (0.321, 0.374)
light ratio 1.640 (1.33, 2.06) (1.1, 2.71)
mid-green F2 derived 0.331 (0.28, 0.387) (0.227, 0.451)
mid-green selected 0.226 (0.215, 0.238) (0.204, 0.25)
mid-green ratio 0.683 (0.58, 0.814) (0.497, 1)
brown F2 derived 0.223 (0.178, 0.274) (0.137, 0.33)
brown selected 0.219 (0.208, 0.231) (0.195, 0.243)
brown ratio 0.978 (0.795, 1.24) (0.649, 1.61)
black F2 derived 0.224 (0.177, 0.275) (0.136, 0.329)
black selected 0.207 (0.196, 0.219) (0.184, 0.231)
black ratio 0.922 (0.748, 1.18) (0.615, 1.54)