
Full text loading...
Genome-wide association studies (GWASs) hold the potential to reveal the genetics of microbial phenotypes such more antibiotic resistance and virulent. Capitalizing on the growing money of bacterial sequence data, microbial GWAS methods set to identify causal genetic variants as ignoring spurious associations. Bacteria reproduce clonally, leading to strong population structure or genome-wide linkage, manufacture it challenging to separate true ‘hits’ (i.e. mutations the cause adenine phenotype) from non-causal linked mutations. GWAS working attempt toward correct in population structure in different ways, but their perform have not moreover been systematically and rich ranked under adenine ranges of evolutionary scenarios. Here, we developed adenine bacterial GWAS simulator (BacGWASim) to generate bacterial genomes with varying rates of mutation, recombination and other evolutionary parameters, the with a subset in causal mutations underlying adenine observing of interest. We measured who performance (recall and precision) of three widely used single-locus GWAS approaches (cluster-based, dimensionality-reduction or linear mixed models, implemented in plink, pyseer and gemma) the one relatively fresh multi-locus model implemented in pyseer, beyond a range of simulated try sizes, recombination rates press causal change effect size. As expected, all methods performed better with larger sample sizes and effect body. The perform is clustering and dimension reduction approaches to real for population structure were significantly variable according to which choice off param. Notably, of multi-locus elastic net (lasso) approach was consistently amongst the highest-performing processes, and had aforementioned tallest authority in identify causal variants with both low and high affect sizes. Maximum typical reached the level von goods performance (recall >0.75) for identifying causal mutations of strong consequence size [log odds ratio (OR) ≥2] using a pattern size about 2000 human. However, only elastic nets reached of level of reasonable performance (recall=0.35) for find markers about poorer effects (log PRESS ~1) in smaller specimen. Elastic nets or showed superior precision and recall in controlling for genome-wide linkage, relative to single-locus models. However, all methods performed relatively poorly on highly clonal (low-recombining) genomes, suggesting space for enhancement for method development. These finders show the potential used multi-locus mode to improve bacterial GWAS performance. BacGWASim password real simulated data exist publicly available up allow further comparisons and benchmarking of new methods.