Instrucciones generales:
Usaremos el juego de datos de secuencias de aminoácidos del paquete msa
para practicar algunas de las herramientas de manejo de datos fundamentales de R:
library(msa)
library(seqinr)
library(ape)
seq_aa <- system.file("examples", "exampleAA.fasta", package = "msa")
seq_aa <- readAAStringSet(seq_aa)
seq_aa
## AAStringSet object of length 9:
## width seq names
## [1] 452 MSTAVLENPGLGRKLSDFGQETS...LKILADSINSEIGILCSALQKIK PH4H_Homo_sapiens
## [2] 453 MAAVVLENGVLSRKLSDFGQETS...KILADSINSEVGILCNALQKIKS PH4H_Rattus_norve...
## [3] 453 MAAVVLENGVLSRKLSDFGQETS...KILADSINSEVGILCHALQKIKS PH4H_Mus_musculus
## [4] 297 MNDRADFVVPDITTRKNVGLSHD...DVAPDDLVLNAGDRQGWADTEDV PH4H_Chromobacter...
## [5] 262 MKTTQYVARQPDDNGFIHYPETE...MALVHEAMRLGLHAPLFPPKQAA PH4H_Pseudomonas_...
## [6] 451 MSALVLESRALGRKLSDFGQETS...LKILADSISSEVEILCSALQKLK PH4H_Bos_taurus
## [7] 313 MAIATPTSAAPTPAPAGFTGTLT...DVVDGDAVLNAGTREGWADTADI PH4H_Ralstonia_so...
## [8] 294 MSGDGLSNGPPPGARPDWTIDQG...AVLTRGTQAYATAGGRLAGAAAG PH4H_Caulobacter_...
## [9] 275 MSVAEYARDCAAQGLRGDYSVCR...QTADFEAIVARRKDQKALDPATV PH4H_Rhizobium_loti
seq_aa
usando el metodo “Muscle”Biostrings
)bionj()
del paquete ape
con el alineamiento generado en el ejercicio 1plot
ggtree
usando colores diferentes para sobresaltar los nodos internos y los nodos terminales (“especies”)
Las secuencias de consenso nos indican cuales sitios son mas conservados. Esta información podemos usarla para crear subconjuntos de los alineamientos con diferentes grados de conservación. En el siguiente ejemplo crearemos árboles filogenéticos usando alineamientos que varian en el grado de conservación.
Antes, es importante entender como podemos crear subconjuntos de un alineamiento. Podemos hacerlo de la siguiente forma:
alg_aa_seq <- msaConvert(alg_aa, type = "seqinr::alignment")
alg_aa_mat <- as.matrix.alignment(alg_aa_seq)
El código anterior convierte el alineamiento en una matriz donde cada posición es una columna y cada secuencia una fila:
alg_aa_mat[, 1:30]
## 1 2 3 4 5 6 7 8 9 10 11 12
## PH4H_Rattus_norvegicus "M" "A" "A" "V" "V" "L" "E" "N" "G" "V" "L" "S"
## PH4H_Mus_musculus "M" "A" "A" "V" "V" "L" "E" "N" "G" "V" "L" "S"
## PH4H_Homo_sapiens "M" "S" "T" "A" "V" "L" "E" "N" "P" "G" "L" "G"
## PH4H_Bos_taurus "M" "S" "A" "L" "V" "L" "E" "S" "R" "A" "L" "G"
## PH4H_Pseudomonas_aeruginosa "-" "-" "-" "-" "-" "-" "-" "-" "-" "-" "-" "-"
## PH4H_Rhizobium_loti "-" "-" "-" "-" "-" "-" "-" "-" "-" "-" "-" "-"
## PH4H_Caulobacter_crescentus "-" "-" "-" "-" "-" "-" "-" "-" "-" "-" "-" "-"
## PH4H_Chromobacterium_violaceum "-" "-" "-" "-" "-" "-" "-" "-" "-" "-" "-" "M"
## PH4H_Ralstonia_solanacearum "M" "A" "I" "A" "T" "P" "T" "S" "A" "A" "P" "T"
## 13 14 15 16 17 18 19 20 21 22 23 24
## PH4H_Rattus_norvegicus "R" "K" "L" "S" "D" "F" "-" "G" "Q" "E" "T" "S"
## PH4H_Mus_musculus "R" "K" "L" "S" "D" "F" "-" "G" "Q" "E" "T" "S"
## PH4H_Homo_sapiens "R" "K" "L" "S" "D" "F" "-" "G" "Q" "E" "T" "S"
## PH4H_Bos_taurus "R" "K" "L" "S" "D" "F" "-" "G" "Q" "E" "T" "S"
## PH4H_Pseudomonas_aeruginosa "-" "-" "-" "-" "-" "-" "-" "M" "K" "T" "T" "Q"
## PH4H_Rhizobium_loti "-" "-" "-" "-" "-" "-" "-" "M" "S" "V" "A" "E"
## PH4H_Caulobacter_crescentus "-" "-" "-" "-" "-" "-" "-" "-" "M" "S" "G" "D"
## PH4H_Chromobacterium_violaceum "N" "D" "R" "A" "D" "F" "-" "-" "-" "V" "V" "P"
## PH4H_Ralstonia_solanacearum "P" "A" "P" "A" "G" "F" "T" "G" "T" "L" "T" "D"
## 25 26 27 28 29 30
## PH4H_Rattus_norvegicus "Y" "I" "E" "D" "N" "S"
## PH4H_Mus_musculus "Y" "I" "E" "D" "N" "S"
## PH4H_Homo_sapiens "Y" "I" "E" "D" "N" "C"
## PH4H_Bos_taurus "Y" "I" "E" "G" "N" "S"
## PH4H_Pseudomonas_aeruginosa "Y" "V" "A" "R" "Q" "P"
## PH4H_Rhizobium_loti "Y" "A" "R" "D" "C" "A"
## PH4H_Caulobacter_crescentus "G" "L" "S" "N" "G" "P"
## PH4H_Chromobacterium_violaceum "D" "I" "T" "T" "R" "K"
## PH4H_Ralstonia_solanacearum "K" "L" "R" "E" "Q" "F"
Podemos convertir esto de vuelta en un alineamiento en el formato seqinr
de la siguiente forma:
alg_aa_seq_2 <- alg_aa_seq
alg_aa_seq_2$seq <- apply(alg_aa_mat, 1, paste0, collapse = "")
… y al igual que con el alineamiento completo, podemos crear árboles filogenéticos con estos subconjuntos:
d <- dist.alignment(alg_aa_seq_2, "identity")
tree <- bionj(d)
plot(tree)
Información de la sesión
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=es_CR.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=es_CR.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=es_CR.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=es_CR.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 parallel stats graphics grDevices utils datasets
## [8] methods base
##
## other attached packages:
## [1] ggtree_2.4.1 ape_5.4-1 seqinr_4.2-4
## [4] msa_1.22.0 Biostrings_2.58.0 XVector_0.30.0
## [7] IRanges_2.24.0 S4Vectors_0.28.0 BiocGenerics_0.36.0
## [10] knitr_1.30
##
## loaded via a namespace (and not attached):
## [1] treeio_1.14.3 progress_1.2.2 tidyselect_1.1.0
## [4] xfun_0.19 purrr_0.3.4 lattice_0.20-41
## [7] colorspace_2.0-0 vctrs_0.3.5 generics_0.1.0
## [10] htmltools_0.5.0 yaml_2.2.1 rlang_0.4.8
## [13] pillar_1.4.6 glue_1.4.2 rvcheck_0.1.8
## [16] lifecycle_0.2.0 stringr_1.4.0 zlibbioc_1.36.0
## [19] munsell_0.5.0 gtable_0.3.0 evaluate_0.14
## [22] labeling_0.4.2 Rcpp_1.0.5 scales_1.1.1
## [25] formatR_1.7 BiocManager_1.30.10 jsonlite_1.7.1
## [28] farver_2.0.3 ggplot2_3.3.2 hms_0.5.3
## [31] aplot_0.0.6 digest_0.6.27 stringi_1.5.3
## [34] dplyr_1.0.2 grid_4.0.3 ade4_1.7-16
## [37] tools_4.0.3 magrittr_2.0.1 lazyeval_0.2.2
## [40] patchwork_1.1.0 tibble_3.0.4 crayon_1.3.4
## [43] tidyr_1.1.2 pkgconfig_2.0.3 MASS_7.3-53
## [46] ellipsis_0.3.1 tidytree_0.3.3 prettyunits_1.1.1
## [49] rmarkdown_2.5 R6_2.5.0 nlme_3.1-150
## [52] compiler_4.0.3