README for Figshare repository containing figures, scripts, alignments, input, and output files from: Automatic selection of partitioning schemes for phylogenetic analyses using iterative k-means clustering of site rates. Paul B. Frandsen, Brett Calcott, Christoph Mayer and Robert Lanfear, which has been submitted to BMC Evolutionary Biology Instructions After downloading and unzipping the data.zip file, you will find six folders, more information about each can be found below: * /Datasets * /Figures_and_code * /partitionfinder * /partitionfinder_TIGER_rates * /Simulated_study * /starting_tree_bias_study /Datasets folder ---------------- This contains the datasets from which the empirical analyses were completed. Each folder is named with the author and year the paper using the data set was published. Each folder also contains the partitionfinder.cfg file and several "best_scheme_**.txt" files. Each is named according to the analysis from which it was output. Each was run using the version of PartitionFinder included in the /partitionfinder folder. A note about the partitionfinder.cfg files: Each data set includes one copy of the partitionfinder.cfg file. This requires modification between each analysis so that the program completes the correct search. * In our paper we include results using two different information criteria, the AICc and the BIC, this can be changed in the partitionfinder.cfg under the ## MODELS OF EVOLUTION ##, e.g. set "model_selection = BIC". * We use three different search options, which are set with "search = ": - "user"-this requires that the scheme be specified as explained in the PartitionFinder manual. In our paper, we used this to find the most partitioned scheme, and defined the scheme with each data block separate. We also used this search for the unpartitioned scheme by defining the entire alignment as one data block. - "greedy"-this search completes PartitionFinder's greedy algorithm - "kmeans"-this search is new for our paper and selects a partitioning scheme using the new iterative k-means algorithm. When using this search, users should define a single data block in which the entire alignment is specified, e.g. for an alignment with 500 sites the data block might read, "whole_alignment = 1-500;". To cluster by site rates, you should also add "--kmeans-opt 2" to the PartitionFinder command. If you wish to run kmeans using TIGER rates (the only method we recommend if using for tree estimation), simply run PartitionFinder from the "partitionfinder_TIGER_rates" folder and use the "kmeans" search. Modifying these options will allow you to perform all of the analyses that we used in our testing. /Figures_and_code folder ------------------------ This contains both the figures used in the paper as well as the R code to generate the figures. "plots_AICc.r" and "plots_BIC.r" generate Figures 2 and 3 as well as Figure 6. "stacked_bar_TIGER.r" generates Figure 4. "stacked_bar_subset_assignments_TIGER.r" generates Figure 5. "concatenated_sim_bar_chart.r" generates Figure 7. /partitionfinder folder ----------------------- WARNING: This version of the code uses likelihood estimates of rates for the kmeans clustering, which we have found can bias final tree searches toward the starting tree. We recommend that for tree estimation, you use the program in the "partitionfinder_TIGER_rates" folder instead. The folder contains the code for the PartitionFinder program with the additional code needed to run the kmeans search. This program can be run using the PartitionFinder manual along with the additional instructions that I have included above describing the /Datasets folder. /partitionfinder_TIGER_rates ----------------------- This folder contains the PartitionFinder code to run the TIGER rates based version of kmeans clustering. This is the recommended program for tree estimation. The "kmeans" algorithm can be used as described under the description of the /Datasets folder. /Simulated_study ----------------------- This folder contains the files used in the simulated study. Please read the README included in the folder for instructions on contents. /starting_tree_bias_study ----------------------- This folder contains the files and code that we used for the starting tree bias study. The folder also contains a README file that includes a description and instructions. Our results from the starting tree bias study are all included in the /starting_tree_bias_study/datasets directory. If you have any questions or trouble feel free to email me at paulbfrandsen@gmail.com.