- Publication date : 2016-01-09
D. Torkamaneh, J. Laroche, F. Belzile., 2016. Fast-GBS: a new pipeline for the efficient and accurate calling of SNPs from genotyping-by-sequencing data. Plant and Animal Genome Conference. San Diego.
Genotyping-by-sequencing (GBS) has been demonstrated to be a robust and cost-effective genotyping method capable of producing thousands to millions of SNPs across a wide range of species. Undoubtedly, the greatest barrier to its broader use is the challenge of data analysis. We describe a new bioinformatics pipeline, Fast-GBS, allowing the efficient processing of raw GBS sequence data into SNP genotypes. Fast-GBS can call SNPs, MNPs, and Indels from reads of variable length obtained using different sequencing platforms (Illumina and Ion Torrent). It requires a reference genome but modest computing resources. We compare the efficiency and accuracy of this pipeline to five other existing pipelines (IGST, TASSEL-GBSv1, TASSEL-GBSv2, UNEAK and Stacks). Using Illumina sequence data from a set of 24 re-sequenced soybean lines, we performed SNP calling with these pipelines and compared the GBS SNP calls with the re-sequencing data to assess their accuracy. The number of SNPs called ranged between 13K (Stacks) and 54K (TASSEL-GBSv1) while accuracy ranged from 76.4% (TASSEL-GBSv1) up to 98.9% (Fast-GBS). Among pipelines offering a high accuracy (>95%), Fast-GBS called the largest number of polymorphisms (close to 40,000 SNPs + Indels). Using Ion Torrent sequence data for the same 24 lines, we compared the performance of Fast-GBS with that of its closest competitor (TASSEL-GBSv2). It again called more polymorphisms (25.8K vs 22.9K) and these proved more accurate (96.4 vs 91.3%). Overall, 89% of the SNPs called by TASSEL-GBS v2 were also called by Fast-GBS. We conclude that Fast-GBS provides a highly rapid, efficient and reliable tool for calling SNPs from GBS data.