DEMETER2 Data Release ----------------------------------------------------------- Contents: ----------------------------------------------------------- *********************************************************** INPUTS *********************************************************** * Achilles_LFC_data * Matrices of LFC data per cell line and shRNA generated by the Broad Achilles project (Tsherniak et al., Cell 2017). Separate matrices are provided for each of three 'batches' of data. Files: a) achilles-98k-repcollapsed-lfc.csv: LFC matrix for ’98k’ data b) achilles-55k-batch1-repcollapsed-lfc.csv: LFC matrix for batch 1 of ‘55k’ data c) achilles-55k-batch2-repcollapsed-lfc-bd7f: LFC matrix for batch 2 of ‘55k’ data * DRIVE_LFC_data * Matrices of LFC data per cell line and shRNA generated by the Novartis DRIVE project (McDonald et al., Cell 2017), reprocessed as described in the DEMETER2 manuscript. Separate matrices are provided for each shRNA pool. Files: a) drive-bgpd-lfc-mat.csv: LFC matrix from ‘BGPD’ pool b) drive-poola-lfc-mat.csv: LFC matrix from poolA c) drive-poolb-lfc-mat.csv: LFC matrix from poolB * Marcotte_LFC_matrix.csv * Matrix of LFC data per cell line and shRNA generated by Marcotte et al., (2016), processed as described in the DEMETER2 manuscript. * shRNA_mapping.csv * Table of mappings between shRNAs and genes used in our analysis. Each shRNA-to-gene mapping is given by a separate row. Some shRNAs map to multiple genes. * Hart_pos_controls.csv * List of 217 genes used as positive controls in our analysis (taken from Hart et al., 2015) * Hart_neg_controls.csv * List of 926 genes used as negative controls in our analysis (taken from Hart et al., 2015) *********************************************************** OUTPUTS *********************************************************** Results of the DEMETER2 model applied to the Achilles, DRIVE, and combined (Achilles, DRIVE and Marcotte 2016) datasets are given by files named respectively: “D2_Achilles_*”, “D2_DRIVE_*”, and “D2_combined_*” ----------------------------------------------------------- D2 model results contents: ----------------------------------------------------------- The primary outputs of the model are the gene dependency scores estimated for each gene and cell line, contained in the files labeled “*_gene_dep_scores”. The full set of model output files are described below. Note: genes are indexed using Entrez IDs during analysis, but we provide gene names in the form: “HGNC_symbol (Entrez_ID)". 1) gene_dep_scores: Estimated gene dependency for each cell line and gene (posterior mean estimates). 2) gene_dep_score_SDs: Uncertainty estimate of gene dependency for each cell line and gene (posterior std. dev.). 3) CL_data: Table of model parameters estimated for each cell line (cell lines are indexed by CCLE names where possible. Note that the tissue type indicated as part of the CCLE name is not necessarily indicative of the annotated tissue type). Includes: a) gene_slope: "Screen signal" parameter (q_j) for each cell line. b) CL_slope: Overall multiplicative scaling term. These are estimated for each cell line j and batch k in the model (a_jk), and are averaged across batches k here. c) noise_vars: Average noise variance per cell line. These are estimated for each cell line j and batch k in the model (sigma_jk), and are provided here averaged across batches k (according to sqrt()). d) offset_mean: Average posterior mean offset term per cell line. These are estimated per cell line j and batch in in the model (a_jk), and are averaged across batches k here. e) offset_sd: Posterior SD of offset terms per cell line (a_jk), averaged across batches k. Averages are computed according to: sqrt(). 4) hp_data: Table of model parameters estimated for each shRNA (shRNAs are indexed by their targeting sequence). Includes: a) Geff: Estimated gene knockdown efficacy of each shRNA (alpha_i). b) Seff: Estimated off-target efficacy of each shRNA (beta_i) c) unpred_offset_mean: Posterior mean of 'unpredicted' across-cell-line average off-target effect per shRNA (c_i) d) unpred_offset_sd: Posterior std dev of 'unpredicted' across-cell-line average off-target effect per shRNA (c_i) e) hairpin_offset_mean: Posterior mean of additive offset per shRNA and batch (theta_ik), averaged across batches k. f) hairpin_offset_sd: Posterior std dev of additive offset per shRNA and batch (theta_ik), averaged across batches (as sqrt()) 5) seed_dep_scores: Estimated seed effects for each cell line and seed sequence (posterior mean estimates). 6) seed_dep_score_SDs: Uncertainty estimate of seed effects for each cell line and seed sequences (posterior std. dev.). *********************************************************** ADDITIONAL INFO *********************************************************** * WES_SNP_CN_data.csv * Gene-level copy number data per cell line, derived from CCLE whole-exome sequencing data, along with CCLE SNP array data. Used for feature-dependency association analysis presented in the DEMETER2 manuscript. * RNAseq_lRPKM_data.csv * log10(RPKM + 0.001) for protein-coding genes, derived from the file CCLE_DepMap_18Q1_RNAseq_RPKM_20180214.gct * CCLE_mutation_data.csv * Mutation data taken from the file CCLE_DepMap_18Q1_maf_20180207.txt. * sample_info.csv * Table of meta data per cell line. Columns include: disease: Overarching category of the primary disease indication the patient was diagnosed with disease_subtype: Refined subcategory of disease type disease_sub_subtype: Refined subcategory of disease sub-type in_DRIVE: was the cell line included in the DEMETER2 DRIVE dataset in_Achilles: was the cell line included in the DEMETER2 Achilles dataset in_Marcotte: was the cell line included in the DEMETER2 Marcotte dataset Novartis_Primary_site: Primary site annotation downloaded from Novartis web portal (https://oncologynibr.shinyapps.io/drive/) Novartis_Pathologist_Annotation: Pathologist annotation downloaded from Novartis web portal Marcotte_subtype_three_receptor: subtype_three_receptor taken from Marcotte cell line subtypes file (http://neellab.github.io/bfg/) Marcotte_subtype_neve: subtype_neve taken from Marcotte cell line subtypes file Marcotte_subtype_intrinsic: subtype_intrinsic taken from Marcotte subtypes file ----------------------------------------------------------- References ----------------------------------------------------------- Hart, T., Chandrashekhar, M., Aregger, M., Steinhart, Z., Brown, K.R., MacLeod, G., Mis, M., Zimmermann, M., Fradet-Turcotte, A., Sun, S., Mero, P., Dirks, P., Sidhu, S., Roth, F.P., Rissland, O.S., Durocher, D., Angers, S. and Moffat, J. 2015. High-Resolution CRISPR Screens Reveal Fitness Genes and Genotype-Specific Cancer Liabilities. Cell 163(6), pp. 1515–1526. Marcotte, R., Sayad, A., Brown, K.R., Sanchez-Garcia, F., Reimand, J., Haider, M., Virtanen, C., Bradner, J.E., Bader, G.D., Mills, G.B., Pe’er, D., Moffat, J. and Neel, B.G. 2016. Functional genomic landscape of human breast cancer drivers, vulnerabilities, and resistance. Cell 164(1–2), pp. 293–309. McDonald, E.R., de Weck, A., Schlabach, M.R., Billy, E., Mavrakis, K.J., Hoffman, G.R., Belur, D., Castelletti, D., Frias, E., Gampa, K., Golji, J., Kao, I., Li, L., Megel, P., Perkins, T.A., Ramadan, N., Ruddy, D.A., Silver, S.J., Sovath, S., Stump, M. and Sellers, W.R. 2017. Project DRIVE: A Compendium of Cancer Dependencies and Synthetic Lethal Relationships Uncovered by Large-Scale, Deep RNAi Screening. Cell 170(3), p. 577–592.e10. Tsherniak, A., Vazquez, F., Montgomery, P.G., Weir, B.A., Kryukov, G., Cowley, G.S., Gill, S., Harrington, W.F., Pantel, S., Krill-Burger, J.M., Meyers, R.M., Ali, L., Goodale, A., Lee, Y., Jiang, G., Hsiao, J., Gerath, W.F.J., Howell, S., Merkel, E., Ghandi, M. and Hahn, W.C. 2017. Defining a cancer dependency map. Cell 170(3), p. 564–576.e16. ----------------------------------------------------------- Version history: ----------------------------------------------------------- v1: Initial data release v2: - Removed small number of non-human genes (e.g. GFP, RFP) from shRNA-to-gene mapping - Updated cell line names to be consistent with DepMap names, according to the following map (old -> new): *********************************************************** MEL285_EYE -> MEL285_UVEA 921_EYE -> 921_UVEA SUM102_BREAST -> SUM102PT_BREAST SUM1315_BREAST -> SUM1315MO2_BREAST SUM149_BREAST -> SUM149PT_BREAST SUM159_BREAST -> SUM159PT_BREAST SUM185_BREAST -> SUM185PE_BREAST SUM190_BREAST -> SUM190PT_BREAST SUM225_BREAST -> SUM225CWN_BREAST SUM229_BREAST -> SUM229PE_BREAST SUM44_BREAST -> SUM44PE_BREAST SUM52_BREAST -> SUM52PE_BREAST HSSYII_SOFT_TISSUE -> HSSYII_LUNG OMM1_EYE -> OMM1_UVEA *********************************************************** Note: In version 1 the cell lines “SUM52_BREAST” and “SUM52PE_BREAST” were treated as separate cell lines, but in version 2 they have been merged to a single cell line (SUM52PE_BREAST). v3: Added seed effect matrices v4: Added RNAseq and mutation data v5: Fixed minor bug with Marcotte LFC data that caused hairpins targeting multiple genes to appear multiple times in the LFC matrix. This created bias in the seed effect estimates for those hairpins, causing very minor differences to the resulting model parameters.