May 28, 2018

How many is at least 50?

A: 5

B: 19

C: 40

D: 49

E: 50

How many chironomids do you need to count?

Effect of low count sums on quantitative environmental reconstructions: an example using subfossil chironomids. Heiri & Lotter (2001)

Setting minimum head capsule abundance and taxa deletion criteria in chironomid-based inference models. Quinlan & Smol (2001)

How many chironomid head capsules are enough? A statistical approach to determine sample size for palaeoclimatic reconstructions. Larocque (2001)

How many chironomids do people say they count?

adjacent samples were pooled to produce a minimum head-capsule count of 50 specimens per sample Heiri and Lotter (2003)

Chironomids were enumerated from subsamples of sediment until >50 headcapsules were counted Lang et al (2017)

A minimum of 50 head capsules were extracted from each sample Langdon et al (2008)

At least 50 head capsules were mounted. Larocque-Tobler et al (2015)

Samples produc[ing] less than 50 head capsules were not included in the subsequent analyses Zhang et al (2017)

How many are actually counted?

Nullius in verba

The reported count sum is testable.

  • Trivial with archived count data

  • More challenging with archived percent data

Rank-Abundance Curves

Last Chance Lake (Axford et al 2017)

## # A tibble: 1 x 3
##   `median singletons` mean_singleton prob1
##                 <dbl>          <dbl> <dbl>
## 1                   3       3.227273  0.96

Estimating the count sum

\(\frac{count} {countSum} \times 100 = percent\)

\(\frac{count}{percent} \times 100 = countSum\)

\(\frac{1}{percent} \times 100 = countSum\)

Can estimate uncertainty due to rounding.

Estimated vs reported count sums at Last Chance Lake

Estimated count sums at another lake

Are estimated count sums correct?

  • Are true count sums ~ 80?
  • Would expect many apparent half counts (and some quarter counts)

Lake Żabińskie

At least 50 head capsules were mounted. Larocque et al (2015)

In Lake Żabińskie, the number of head capsules varied from 19 to 68.5 with 29 out of the 89 (33%) samples having abundances lower than 30 Larocque-Tobler et al (2016) [Corrigendum]

## # A tibble: 6 x 5
##      mn    est_n  est_min  est_max     n
##   <dbl>    <dbl>    <dbl>    <dbl> <dbl>
## 1  20.0 5.000000 4.987531 5.012531  20.0
## 2  14.3 6.993007 6.968641 7.017544  21.0
## 3  14.3 6.993007 6.968641 7.017544  21.0
## 4  12.5 8.000000 7.968127 8.032129  32.0
## 5  12.2 8.196721 8.163265 8.230453  41.0
## 6  10.3 9.708738 9.661836 9.756098  43.5
## # A tibble: 61 x 7
## # Groups:   id [9]
##       mn est_n     n    id                   species percent multiple
##    <dbl> <dbl> <dbl> <int>                     <chr>   <dbl>    <dbl>
##  1  20.0     5    20     6      Cladopelma lateralis    20.0    1.000
##  2  20.0     5    20     6    Endochironomus tendens    20.0    1.000
##  3  20.0     5    20     6    Microtendipes pedellus    20.0    1.000
##  4  20.0     5    20     6            Paratanytarsus    40.0    2.000
##  5  12.5     8    32     7               Chironomini    12.5    1.000
##  6  12.5     8    32     7    Chironomus anthracinus    12.5    1.000
##  7  12.5     8    32     7   Polypedilum nubeculosum    12.5    1.000
##  8  12.5     8    32     7                Cricotopus    12.5    1.000
##  9  12.5     8    32     7                Procladius    12.5    1.000
## 10  12.5     8    32     7 Endochironomus albipennis    14.1    1.128
## # ... with 51 more rows

Diatoms too

The diatom counts included 400–500 valves per sample and at least 100 valves in 4 diatom-poor samples.

And pollen

Two hundred grain counts … were made for each level

99.8% of counts are divisible by two

## # A tibble: 54 x 2
##    depth        d4
##    <chr>     <dbl>
##  1 59052 1.0000000
##  2 59094 1.0000000
##  3 59098 0.8666667
##  4 59097 0.8333333
##  5 59068 0.8000000
##  6 59073 0.8000000
##  7 59066 0.7777778
##  8 59089 0.7777778
##  9 59059 0.7500000
## 10 59054 0.7142857
## # ... with 44 more rows

Explanations

  • Occasionally samples will lack singletons
    • Very low diversity
    • Very low count sums
  • Enthusiastic exclusion of rare taxa

  • Low taxonomic resolution

  • Misreporting

How large a count sum should be assummed if it is not reported?

Sanity Checks

Impossible percent rule 100

Percent should sum to 100 %

Impossible percent rule 100

  • Exclusion of unknown taxa
  • Rounding
  • Miscalculation

1000 simulations 30 taxa rounded to different precisions <> ##Impossible percent integer rule

  • Assume rarest taxa represented by one individual
  • all percent should be integer multiples of lowest percent value

  • Discrepancies need to be checked carefully

Integer rule example

Taxon Percent Estimated
Mesocricotopus 0.826446 1.00
Corynocera oliveri 1.652890 2.00
Eukiefferiella fittkaui 1.652890 2.00
Tanytarsus sp 1.652890 2.00
Micropsectra radialis 3.305790 4.00
Paratanytarsus 3.305790 4.00
Protanypus 3.900000 4.72
Paracladius 5.785120 7.00
Abiskomyia 6.611570 8.00
Cricotopus 8.264460 10.00
H# maeri 52.892600 64.00
## function (x) 
## {
##     structure(x, class = unique(c("AsIs", oldClass(x))))
## }
## <bytecode: 0x2868d80>
## <environment: namespace:base>

No (near) duplicate assemblages

Near duplicate assemblages should be rare

Counting errors

Counting error with 50 microfossils, true abundance = 20%

Duplicates from Lake Żabińskie v1

Taxon 1902 1940 1981 1987
Cladotanytarsus mancus1 11.1111 11.1111 11.1111 11.1111
Corynoneura 11.1111 11.1111 11.1111 11.1111
Dicrotendipes nervosus 11.1111 11.1111 11.1111 11.1111
Microtendipes pedellus 11.1111 11.1111 11.1111 11.1111
Paratanytarsus 11.1111 11.1111 11.1111 11.1111
Parochlus 11.1111 11.1111 11.1111 11.1111
Procladius 11.1111 11.1111 11.1111 11.1111
Tanytarsus lugens 11.1111 11.1111 11.1111 11.1111
Tanytarsus pallidicornis 11.1111 11.1111 11.1111 11.1111

Identifying possible duplicates

Find most similar pair of samples

Taxon 1964 % 1967 % 1964 count 1967 count
Chironomini 13.2 17.1 5.0 7
Cladotanytarsus mancus1 14.5 14.6 5.5 6
Glyptotendipes pallens 3.9 0.0 1.5 0
Microtendipes pedellus 14.5 14.6 5.5 6
Tanytarsus lactesens 13.2 12.2 5.0 5
Tanytarsus mendax 14.5 14.6 5.5 6
Tanytarsus sp 26.3 26.8 10.0 11

  • Assume samples drawn from same population
  • Draw many replicate samples from population
  • Find Bray-Curtis distance between replicates
  • Compare with observed distance

Conclusions and ways forward

  • Methods to flag dubious data are being developed
  • Mistakes, errors and other problems will be discovered

  • Many mistakes may have little effect

  • Honestly describe count sizes
  • Take care with calculations

  • Archive full count data
  • Archive code