Files present in the Analysis directory: . ├── Analysis ├── all_consensus_assembly.fa <- consensus sequence for each tag (read1 and read 2 assembly merged where possible) ├── all_consensus_assembly_G10.bam <- alignment for each sample against all_consensus_assembly.fa ├── all_consensus_assembly_G1.bam ├── all_consensus_assembly_G2.bam ├── all_consensus_assembly_G3.bam ├── all_consensus_assembly_G5.bam ├── all_consensus_assembly_G6.bam ├── all_consensus_assembly_G7.bam ├── all_consensus_assembly_G8.bam ├── all_consensus_assembly_G9.bam ├── all_consensus_assembly_P1.bam ├── all_consensus_assembly_P2.bam ├── all_consensus_assembly_PR1-1.bam ├── all_consensus_assembly_PR1-2.bam ├── all_consensus_assembly_PR1-3.bam ├── all_consensus_assembly_PR2-1.bam ├── all_consensus_assembly_PR2-2.bam ├── all_consensus_assembly_PR2-3.bam ├── all_consensus_assembly_TestSample-G4.bam ├── all_consensus.bam <- alignment all samples against all_consensus_assembly.fa ├── all_consensus_samtools1_snps_files.vcf.gz <- raw variants called with samtools (bgzipped) ├── all_consensus_samtools1_snps_files.vcf.gz.tbi <- index file for raw variants ├── all_consensus_snp_P1_P2_informative.vcf <- snps where the parents have 2 alleles ├── all_consensus_snp_PvsG_informative.vcf <- snps that differ between pedigree and population samples ├── all_consensus_var_P1_P2_informative.vcf <- variants where the parents have 2 alleles ├── all_consensus_var_PvsG_informative.vcf <- variants that differ between pedigree and population samples ├── all_summary_stat.txt <- summary statistics for all tags and samples └── Analysis.txt <- this document 0 directories, 29 files ########## The analysis has been done de novo. The samples have been analysed with our RAD-seq analysis pipeline: 1. clustering of read 1 into stacks using ustacks v1.30 (parameters -t fastq -p 1 -m 2 -M 2 -N 4 -H) 2. calling consensus for read 1 stacks using cstacks v1.30 3. filter stacks to remove those supported by less than 3 samples 4. assembly of read 2 of each stack/RADtag using idba_ud (v1.09) 5. merging read 2 assembled contigs with read 1 where possible 6. mapping of all reads of all samples to the assembled contigs using SMALT (release 0.6.2) 7. calling raw SNPs for each tag using samtools mpileup (v. 1.2_64bit) and bcftools call (v. 1.2) ########## Selection of informative variants and snps in the predegree Criteria: 1. variants are present in both parents 2. genotype quality GQ > 20 3. read depth DP > 5 4. with genotypes allowed (P1/P2): ab/ab, aa/ab, bb/ab, ab/aa, ab/bb 5. not INDEL <- SNPs only ########## Selection of variants and snps differing between pedegree and population samples Criteria: 1. variants are present in both parents 2. variants are present in at least 2 population samples 3. genotype quality GQ > 20 4. read depth DP > 5 5. genotypes allowed (ped/pop): hom/homVar, homVar/hom 6. not INDEL <- SNPs only