Estimating species richness from virome data accounting for variations within the virus population

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

University of Peradeniya, Sri Lanka

Abstract

Species richness is a key species diversity measure. It corresponds to the number of species in an environmental sample. Estimating species richness of a metagenome of viruses (i.e., a virome) based on the reference data is challenging because of the limited amount of sequence data of viruses available in reference databases. A limitation identified with the methods that do not rely on reference sequence data in estimating species richness while being based on the contig spectrum is the assumption of equal genome length for all the species in the sample. This work aims to formulate a mathematical model to estimate species richness from a virome considering the variability of the genome lengths of species in the sample in contrast to the mentioned methods. A model is derived for the expected contig spectrum and the parameters of the model including the species richness is estimated through optimization for the least error between expected and observed contig spectra. Genetic Algorithm is used as the optimization algorithm in parameter estimation. The optimisation procedure incorporated in the proposed approach is shown to be robust based on the results with simulated data. This work enables inference of genome lengths distribution from the metagenomic sequence data in addition to estimating the species richness and can be applied to virome originating from any environmental sample.

Description

Citation

Proceedings of the Peradeniya University International Research Sessions (iPURSE) – 2023, University of Peradeniya, P 105

Collections