Ecological Archives E088-015-A2

Glenn De'ath. 2007. Boosted trees for ecological modeling and prediction. Ecology 88:243–251.

Appendix B. A classification example of boosting using fish scale data.

Scales from barramundi fish were collected by researchers and recreational and commercial fishers from freshwater and estuarine habitats (Cappo et al. 2005). Of the 270 cases, 129 were from estuarine locations and 141 from freshwater. A total of nine elements comprising strontium (Sr), barium (Ba), calcium (Ca), iron (Fe), potassium (K), magnesium (Mg), manganese (Mn), phosphorous (P), and sulfur (S) were quantified for each scale. Previous studies had expressed the concentrations of the elements as a ratio of Ca and that practice is also followed here. Future reference to Ca is omitted for simplicity. The predictors were all log transformed (base 2) to assist interpretation of results. The principal objective of the study was to predict the origins of the fish (freshwater or estuarine) from the ratios of the constituent elements.

The single classification tree with a size of nine nodes was selected based on cross-validation and the 1-SE rule (Fig. B1a). The training error was just 1.1% (3/270), but PE based on cross-validation was 11.2%. Sr was the dominant predictor with tapering influence of the other predictors from Fe to Zn (Fig. B1b).

A series of ABTs were then fitted to the data. The first ABT used all eight predictors and had PE of 3.97% compared to the single classification tree (11.2%). The variable importance plot shows strong effects for Sr, moderate effects for Fe, Ba, S, and Mn, and negligible effects Mg, K, and Zn (Fig. B2). The single predictor dependency plots show predominantly monotonic but non-linear effects for Sr, Fe, Ba, S, and Mn. The PE improved to 3.04% when Mg, K, and Zn were dropped from the models. Sequentially reducing the level of interactions from fourth-order to main effects gave small but consistent increases in the PE, with the main effects model having a PE of 4.07. These changes are consistent with tapering interactions that diminish at higher orders. Applying monotonic constraints to all the predictors, and including first- to fourth-order interactions, gave the lowest PE of 2.95%. The measures of importance, partial influence, individual, and pair interactions suggested a large number of interactions. The two strongest pairwise interactions, between Mn and Fe, and Mn and Ba, show contrasting patterns (Fig. B3) in that (1) Mn has a strong effect at low levels of Fe, but little effect at high levels of Fe, and (2) Mn has little effect at low levels of Ba, but a strong effect at high levels of Ba.

In comparison with other methods, the ABT gave a PE of 3.97%, whereas the nine-node single classification tree (11.1%), bagged trees (6.44%), random forests (4.67%), linear discriminant analysis (4.89%), and quadratic discriminant analysis (4.11%) all resulted in higher PE. Typical standard deviations of PE were ~0.1% with the exception of SRT (0.5%).

 
   FIG. B1. A single classification tree analysis (a) of the barramundi scale data shows five different variables are used for the splits. Strontium (Sr) accounted for most variation of the freshwater-estuarine classification, with the remaining four variables manganese (Mn), iron (Fe), sulphur (S), and barium (Ba) involved in interactions at high levels of Sr. The error rate of the tree was just three cases of 270 (1.1%), but, averaged over 10 5-fold cross-validations, the estimated prediction error was 11.22%. The variable importance plot (b) showed Sr accounted for most variation, with negligible variation due to Mg, K, and Zn.

 

 
   FIG. B2. Relative variable importance plot and partial dependency plots for boosted tree analyses of the barramundi scale data. Two analyses were undertaken. First, all eight predictors were used. The importance plot shows their relative contributions to predicting the origin of the fish scales, and the five partial plots (black) show the dependencies of the log-odds (base 2) of a scale being from freshwater for the five leading predictors. A second analysis, dropping Mg, K, and Zn, and restricting the dependency of all remaining predictors to be monotonic is shown in gray.

 

 
   FIG. B3. Partial dependency plots show the interaction effects of Mn with Fe and Ba.



[Back to E088-015]