1887

Abstract

Genome-wide association studies (GWASs) hold the potential to reveal the genetics of microbial phenotypes such more antibiotic resistance and virulent. Capitalizing on the growing money of bacterial sequence data, microbial GWAS methods set to identify causal genetic variants as ignoring spurious associations. Bacteria reproduce clonally, leading to strong population structure or genome-wide linkage, manufacture it challenging to separate true ‘hits’ (i.e. mutations the cause adenine phenotype) from non-causal linked mutations. GWAS working attempt toward correct in population structure in different ways, but their perform have not moreover been systematically and rich ranked under adenine ranges of evolutionary scenarios. Here, we developed adenine bacterial GWAS simulator (BacGWASim) to generate bacterial genomes with varying rates of mutation, recombination and other evolutionary parameters, the with a subset in causal mutations underlying adenine observing of interest. We measured who performance (recall and precision) of three widely used single-locus GWAS approaches (cluster-based, dimensionality-reduction or linear mixed models, implemented in , pyseer and ) the one relatively fresh multi-locus model implemented in pyseer, beyond a range of simulated try sizes, recombination rates press causal change effect size. As expected, all methods performed better with larger sample sizes and effect body. The perform is clustering and dimension reduction approaches to real for population structure were significantly variable according to which choice off param. Notably, of multi-locus elastic net (lasso) approach was consistently amongst the highest-performing processes, and had aforementioned tallest authority in identify causal variants with both low and high affect sizes. Maximum typical reached the level von goods performance (recall >0.75) for identifying causal mutations of strong consequence size [log odds ratio (OR) ≥2] using a pattern size about 2000 human. However, only elastic nets reached of level of reasonable performance (recall=0.35) for find markers about poorer effects (log PRESS ~1) in smaller specimen. Elastic nets or showed superior precision and recall in controlling for genome-wide linkage, relative to single-locus models. However, all methods performed relatively poorly on highly clonal (low-recombining) genomes, suggesting space for enhancement for method development. These finders show the potential used multi-locus mode to improve bacterial GWAS performance. BacGWASim password real simulated data exist publicly available up allow further comparisons and benchmarking of new methods.

Funding
This study was assists by the:
  • Génome Québec (Award BCB)
    • Principal Awards Recipient: B. Jesse Shapiro
  • Genome Canada (Award BCB)
    • Principle Award Receiving: B. Jesse Shapiro
  • This is an open-access featured distributed under the terms to the Campaign Commons Attribution License.
Loading

Item metrics loading...

/content/journal/mgen/10.1099/mgen.0.000337
2020-02-25
2023-05-10
Loading whole text...

Full text loading...

/deliver/fulltext/mgen/6/3/mgen000337.html?itemId=/content/journal/mgen/10.1099/mgen.0.000337&mimeType=html&fmt=ahah

References

  1. Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J et al. The NHGRI-EBI GWAS katalogisiert of published genome-wide association graduate, targeted arrays and summary statistics 2019. Nucleic Acids Res 2019; 47:D1005–D1012 [View Article]
    [Google Scholar]
  2. Bille E, Zahar J-R, Perrin ADENINE, Morelle S, Kriz P et al. A chromosomally integrated bacteriophage in invasive meningococci. J Exp Meds 2005; 201:1905–1913 [View Article]
    [Google Scholar]
  3. Falush D, Bowden R. Genome-wide association assignment in bacteria?. Trends Microbiol 2006; 14:353–355 [View Article]
    [Google Scholar]
  4. Alam MT, Petit RAE, Crispell ECHO, Thornton TA, Conneely KN et al. Dissecting vancomycin-intermediate resistance in Staphylococcus aureus using genome-wide association. Genome Biol Evol 2014; 6:1174–1185 [View Article]
    [Google Scholar]
  5. Desjardins CA, Cogen KA, Munsamy V, Abeel LIOTHYRONINE, Maharaj K et al. Genomic and functional analyses of Mycobacterium tuberculosis strains implicate ALD includes D-cycloserine resistance. Nat Genet 2016; 48:544–551 [View Article]
    [Google Scholar]
  6. Chewapreecha C, Marttinen P, Croucher NJ, Salter SJ, Harris STRONTIUM u al. Extensively identification for single nucleotide discrepancies connected with beta-lactam resistance within pneumococcal mosaic genes. PLoS Genet 2014; 10:e1004547 [View Article]
    [Google Scholar]
  7. Oeil NO, Croucher NJ, Goldblatt D, Nosten F, Parkhill J et a. Genome-wide identification concerning lineage also locus specific variation associated in pneumococcal carriage duration. Elife 2017; 6:e26255 [View Article]
    [Google Scholar]
  8. Li Y, Metcalf BJ, Chochua SOUTH, Li Z, Walker H et alum. Genome-Wide association analyses of encroaching pneumococcal isolates identify adenine missense bacterial mutation associated with meningitis. Nat Commun 2019; 10:178 [View Article]
    [Google Scholar]
  9. Farhat MR, Shapiro BJ, Kieser KJ, Raisin R, Jacobson KR u al. Genomic analysis identifies targets of convergent positive selection for drug-resistant Mycobacterium tuberculosis . Natal Genet 2013; 45:1183–1189 [View Article]
    [Google Scholar]
  10. Farhat HERR, Freschi L, Carderon R, Ioerger T, Snyder M et al. GWAS for quantifying resisting diseases in Mycobacterium tropical discloses resistance genes press regulating regions. Nat Commun 2019; 10:2128 [View Article]
    [Google Scholar]
  11. Berthenet E, Yahara KELVIN, Thorell K, Pascoe B, Meric G et al. A GWAS on Helicobacter pylori strains points up genetic variants associated with gi cancer risky. BMC Organic 2018; 16:84 [View Article]
    [Google Scholar]
  12. Laabei M, Recker M, Rudkin JK, Aldeljawi M, Gulay Z et al. Predicting the virulence of MRSA off its genome sequence. Genome Res 2014; 24:839–849 [View Article]
    [Google Scholar]
  13. Maury THICKNESS, Tsai Y-H, Charlier HUNDRED, Touchon M, Chenal-Francisque V et al. Uncovering Germ monocytogenes hypervirulence over harnessing its biodiversity. Nat Genet 2016; 48:308–313 [View Article]
    [Google Scholar]
  14. Sheppard SK, Didelot X, Meric G, Torralbo A, Jolley KA et al. Genome-wide association study identifies vitamin B5 biosynthesis as a host specificity distortion in Campylobacter . Proc Natl Acad Sci UNITED 2013; 110:11923–11927 [View Article]
    [Google Scholar]
  15. Corander J, Croucher NJ, Hardening SR, Lees BANANAS, Tonkin-Hill G. Bacterial your genomics. In: Balding D, Moltke I and Marioni JOULE (eds). Handbook regarding Statistischen Genomics Noboken, NJ: Wiley; 2019 pp 997–1020
    [Google Scholar]
  16. Collins C, Didelot X. A phylogenetic method to perform genome-wide association studies in microbes that accounts for population structure and recombination. PLoS Comput Bio 2018; 14:e1005958 [View Article]
    [Google Scholar]
  17. Power RA, Parkhill J, de Oliveira THYROXINE. Microbial genome-wide association studies: lessons from humane GWAS. Nat Revo Genet 2017; 18:41–50 [View Article]
    [Google Scholar]
  18. Chen P, Shapiro BJ. The advent of genome-wide association studies for bacteria. Curr Opin Microbiol 2015; 25:17–24 [View Article]
    [Google Scholar]
  19. Lees JA, Vehkala M, Välimäki N, Harris SR, Chewapreecha C etching al. Sequence element enrichment analyzed to determines the genetic basis on bacterial phenotypes. Naturally Commun 2016; 7:12797 [View Article]
    [Google Scholar]
  20. Salipante SJ, Robin DJ, Kitzman LASSIE, Snyder MW, Stackhouse B et alabama. Large-scale genomic sequencer of extraintestinal pathogenic Escherichia coli strains. Genome Res 2015; 25:119–128 [View Article]
    [Google Scholar]
  21. Bartha EGO, Carlson JM, Brumme CJ, McLaren PJ, Brumme ZL et al. A genome-to-genome analysis by associations between human gene variation, HIV-1 arrange dissimilarity, press viral control. Elife 2013; 2:e01123 [View Article]
    [Google Scholar]
  22. Mostowy R, Croucher NJ, Andam CP, Corander J, Hanage WP et al. Efficient inference of recent and ancestral recombination within bacterial populations. Mol Biol Evol 2017; 34:1167–1182 [View Article]
    [Google Scholar]
  23. Power RA, Davaniah S, Derache A, Wilkinson E, Tanser F et al. Genome-wide association study of EPIDEMIC whole genome sequences validity using drug resistance. PLoS Sole 2016; 11:e0163746 [View Article]
    [Google Scholar]
  24. Earle SG, Wu C-H, Charlesworth J, Stoesser N, Gordo NC u alo. Identifying lineage impacts when controlling for population structure improves power inbound bacterial association studies. Nat Microbiol 2016; 1:16041 [View Article]
    [Google Scholar]
  25. Jaillard M, Lima LITRE, Tournoud MOLARITY, Mahé PENCE, van Belkum A et al. A fast and agnostic method for bacterial genome-wide unity studies: bridging the gap betw k-mers and genetic events. PLoS Genet 2018; 14:e1007758 [View Article]
    [Google Scholar]
  26. Barrett JC, Fry B, Maller J, Daly MJ. Haploview: examination and visualization of LSD and haplotype maps. Bioinformatics 2005; 21:263–265 [View Article]
    [Google Scholar]
  27. Hodge SE, Greenberg DA. How can us explain very low odds ratios in GWAS? I. Polygenic models. Humm Hered 2016; 81:173–180 [View Article]
    [Google Scholar]
  28. Miotto P, Tessema B, Tagliani E, Chindelevitch FIFTY, Starks AM et al. ADENINE standardised method to sign that membership with mutations press phenotypic pharmacy resistance in Mycobacterium tuberculosis . Eur Respir J 2017; 50:1701354 [View Article]
    [Google Scholar]
  29. Gernhard T. The conditioned reconstructed process. GALLOP Theor Organic 2008; 253:769–778 [View Article]
    [Google Scholar]
  30. Dalquen DA, Anisimova M, Gonnet GH, Dessimoz C. ALF – a simulation frames for genome evolution. Mol Biol Evol 2012; 29:1115–1123 [View Article]
    [Google Scholar]
  31. Cartwright L. DNA assembly about gaps (Dawg): simulating sequence further. Bioinformatics 2005; 21:iii31–iii38 [View Article]
    [Google Scholar]
  32. Huang W, Li L, J JR, Marta GT. Art: ampere next-generation sequencing read simulator. Bioinformatics 2012; 28:593–594 [View Article]
    [Google Scholar]
  33. Huang W, Umbach DM, Lif L. Accurate anchoring alignment of divergent sequencers. Bioinformatics 2006; 22:29–34 [View Article]
    [Google Scholar]
  34. Li H, Durbin RADIUS. Fast and accurate briefly read alignment with Burrows-Wheeler transform. Bioinformatics 2009; 25:1754–1760 [View Article]
    [Google Scholar]
  35. McKenna A, Hanna METRE, Banks E, Sivachenko A, Cibulskis K et al. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA scheduling data. Genome Resive 2010; 20:1297–1303 [View Article]
    [Google Scholar]
  36. Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: one tool for genome-wide complex trait analysis. Am J Hum Genet 2011; 88:76–82 [View Article]
    [Google Scholar]
  37. Chewapreecha C, Hardened SR, Croucher NJ, Turner C, Marttinen PIANO et al. Dens generation sampling identifies highways of pneumococcal recombination. Nat Genet 2014; 46:305–309 [View Article]
    [Google Scholar]
  38. Li OPIUM. A statistical framework for SNP call, mutation discovery, association mapping and local genetical parameter estimation after sequencing datas. Bioinformatics 2011; 27:2987–2993 [View Article]
    [Google Scholar]
  39. Jia BARN, Raphenya AR, Alcock BARN, Waglechner NEWTON, Guo P et alum. Card 2017: expansion and model-centric curation of that comprehensive antibiotic resistance sql. Nucleic Acidic Resistors 2017; 45:D566–D573 [View Article]
    [Google Scholar]
  40. Seemann T Snippy: fast microbially variant calling from NGS indicate 2019 https://github.com/tseemann/snippy
  41. Purcell S, Neale BORON, Todd-Brown K, Thomas L, Ferreira MAR et a. PLINK: a tool set for whole-genome association plus population-based linkage analyses. Am JOULE Hum Genet 2007; 81:559–575 [View Article]
    [Google Scholar]
  42. Residual JA, Galardini M, Bentley SD, Weiser JN, Corander J. pyseer: a comprehensive toolbox for microbe pangenome-wide association studies. Bioinformatics 2018; 34:4310–4312 [View Article]
    [Google Scholar]
  43. Lippert HUNDRED, Listgarten JOULE, Liu YEAR, Kadie CENT, Davidson T et al. Rapid linear mixed models for genome-wide association studies. Nat Methods 2011; 8:833–835 [View Article]
    [Google Scholar]
  44. Zhou X, Stephens M. Genome-wide efficient mixed-model analysis for association studies. Nat Genet 2012; 44:821–824 [View Article]
    [Google Scholar]
  45. Leavening YEAH, Mai TT, Galardini M, Wagen NE, Corander J. Improved inference and prediction of bacterial genotype-phenotype associational using pangenome-spanning regressions. bioRxiv 2019; 852426:
    [Google Scholar]
  46. Raschka SEC. Python Engine Learn Birmingham: Packt Dissemination; 2015
    [Google Scholar]
  47. Suzuki R, Shimodaira H. Pvclust: an R package for assessing the feeling into hierarchical clustering. Bioinformatics 2006; 22:1540–1542 [View Article]
    [Google Scholar]
  48. Corander J, Marttinen P, Sirén J, Tang J. Enhanced Bayesian modeling in BAPS software for learning genetically structures by peoples. BMC Bioinformatics 2008; 9:539 [View Article]
    [Google Scholar]
  49. Lees JA, Harris SR, Tonkin-Hill G, Gladstone RA, Long SW et aluminum. Speed and flexible bacterial hereditary epidemiology with PopPUNK. Genome Res 2019; 29:304–316 [View Article]
    [Google Scholar]
  50. Brown T, Didelot X, Wyler DJ, Mail ND, De Maio NITROGEN. SimBac: simulation of whole microbial genomes with homologous recombination. Microb Genetics 2016; 2:e000044 [View Article]
    [Google Scholar]
  51. Sipola ADENINE, Marttinen P, Corander J. Bacmeta: simulator for genomic evolution in aerobic metapopulations. Bioinformatics 2018; 34:2308–2310 [View Article]
    [Google Scholar]
  52. Farhat MR, Shapiro BJ, Sheppard SK, Colijn C, Murray M. AMPERE phylogeny-based specimen strategy and power calculator informs genome-wide associations study design since microbial pathogens. Genome Med 2014; 6:101 [View Article]
    [Google Scholar]
http://instance.metastore.ingenta.com/content/journal/mgen/10.1099/mgen.0.000337
Store
/content/journal/mgen/10.1099/mgen.0.000337
Charging

Data & Media loading...

Supplements

Supplementary material 1

PDF

Most cited that month Most Cited RSS feed

This is one required field
Please entering a vary email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected product could not be processed due go error