RAREFAN: RAyt/REpin Finder and ANalyzer

Introduction

The RAREFAN webserver aims to identify and analyze RAYT transposases and their associated REPINs in bacterial species. The input to the server is a selection of closely related strains (<5% divergence). There are three example datasets from Neisseria, E. coli, and Pseudomonas chlororaphis. Neisseria and E. coli both contain Group 2 RAYTs (mainly found in enterobacteria), whereas P. chlororaphis contains Group 3 RAYTs (mainly found in P. putida, P. fluorescens and P. syringae). For more explanation see Bertels, Gallie and Rainey (2017). Our service provides an analysis of REPIN population size, how it relates to REPIN replication rate and the presence and absence of RAYTs across all submitted genomes.

More information about the RAREFAN algorithm, example applications, and usage instructions are provided in our article in Peer Community Journal (DOI:10.24072/pcjournal.244).

Third party tools

RAREFAN uses the following tools:

  • andi for tree building:
    B Haubold, F Klötzl, and P Pfaffelhuber. andi: fast and accurate estimation of evolutionary distances between closely related genomes. Bioinformatics, 2015 vol. 31 (8) pp. 1169-1175.0 Bioinformatics, 2015 vol. 31 (8) pp. 1169-1175. DOI:10.1093/bioinformatics/btu815

  • MCL for REPIN population clustering:
    A J Enright, S Van Dongen, and C A Ouzounis. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Research, 2002 vol. 30 (7) pp. 1575-158. DOI:10.1093/nar/30.7.1575

  • BLAST+ for identifying RAYT relatives in the different genomes:
    C Camacho, G Coulouris, V Avagyan, N Ma, J Papadopoulos, K Bealer, and T L Madden. BLAST+: architecture and applications. BMC Bioinformatics, 2009 vol. 10 (1) pp. 421-9. DOI:10.1186/1471-2105-10-421

  • MUSCLE for the alignment of identified RAYT sequences:
    R C Edgar. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 2004 vol. 32 (5). DOI:10.1093/nar/gkh340.

  • PhyML3 for generating RAYT phylogenies:
    S Guindon, J -F Dufayard, V Lefort, M Anisimova, W Hordijk, and O Gascuel. New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0. Systematic Biology, 2010 vol. 59 (3). DOI:10.1093/sysbio/syq010

Example datasets

The following datasets are publicly available from zenodo. Follow the links below to visualize the respective results obtained with the current version of RAREFAN or the first released version which corresponds to the zenodo deposit.

Code availability

The source code for all RAREFAN components is available in our github repository. A system independent package to run RAREFAN from the commandline can be downloaded from the github releases page.

Citing RAREFAN

Please use the following citation to refer to RAREFAN:
Carsten Fortmann-Grote, Julia v. Irmer, & Frederic Bertels. RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomes. Peer Community Journal, 2023, 3 DOI:10.24072/pcjournal.244


Note: RAREFAN is still under development. The results may be erroneous or not be displayed correctly. In case of issues or questions please do not hesitate to contact us at rarefan@evolbio.mpg.de .