A complete pipeline of free bioinformatics tools for de novo transcriptome assembly and SSR primer design

Naranpanawa, D. N. U.; Chandrasekara, C. H. W. M. R. B.; Bandaranayake, P. C. G.; Bandaranayake, A. U.

A complete pipeline of free bioinformatics tools for de novo transcriptome assembly and SSR primer design

Files

iPURSE 2019 Proceedings-1 [44].pdf (78.8 KB)

Date

2019-09-12

Authors

Naranpanawa, D. N. U.

Chandrasekara, C. H. W. M. R. B.

Bandaranayake, P. C. G.

Bandaranayake, A. U.

Publisher

University of Peradeniya

Abstract

During the past few decades, next-generation sequencing technologies have grown exponentially in terms of throughput, speed and reduction of sequencing cost. This has revolutionized the field of genomics, allowing the production of vast datasets. However, methods and software requirements for analyzing this data to interpret correct biological meaning are not experiencing the same growth rate. One such limitation is the unaffordable price of commercially available bioinformatics software. Hence, only a small fraction of genomes and transcriptomes have been completely assembled and annotated. Lack of reference genomes for comparative assembly lead to computationally more challenging de novo assembly. In addition, obtaining an assembly is a complex process that require many steps by using several complex tools. Due to this, beginners in bioinformatics might find analysis procedures too complicated and time-consuming with the associated learning-curve. Therefore, in order to aid novice biologists in assembling sequence data, and to bridge the bottleneck in computational biology and bioinformatics, we present a complete pipeline of freely available bioinformatics software for de novo transcriptome assembly. This pipeline was developed by combining several individual software through user-friendly shell scripts. To test the pipeline, we used Illumina HiSeq paired-end RNA-seq reads from four oil-producing Santalum album (sandalwood) tree samples from a published study. The raw data were first filtered for low quality reads, trimmed for adapters and normalized. Assembly was performed with Trinity de novo assembler. The quality of the assembly was tested with BUSCO, Bowtie2 and TransRate, and indicated to be high quality. In order to further validate the accuracy of the assembly, we used the assembled transcriptome to identify gene-specific Simple Sequence Repeat (SSR) markers. Primers were designed for eight S. album oil biosynthetic genes and two control genes, which were validated in the laboratory with respective samples. All primers amplified successfully, confirming the designed workflow. Furthermore, five SSR markers polymorphic among tested sandalwood accessions are potential markers to be utilized in sandalwood breeding programs. To the best of our knowledge, this is the first attempt of developing a user-friendly, validated assembly pipeline with free bioinformatics software and tools, provided with detailed documentation.