RAREFAN: RAyt/REpin Finder and ANalyzer

Introduction

The RAREFAN webserver aims to identify and analyze RAYT transposases and their associated REPINs in bacterial species. The input to the server is a selection of closely related strains (<5% divergence). There are three example datasets from Neisseria, E. coli, and Pseudomonas chlororaphis. Neisseria and E. coli both contain Group 2 RAYTs (mainly found in enterobacteria), whereas P. chlororaphis contains Group 3 RAYTs (mainly found in P. putida, P. fluorescens and P. syringae). For more explanation see Bertels, Gallie and Rainey (2017). Our service provides an analysis of REPIN population size, how it relates to REPIN replication rate and the presence and absence of RAYTs across all submitted genomes.

Third party tools

RAREFAN uses the following tools:

  • andi for tree building:
    B Haubold, F Klötzl, and P Pfaffelhuber. andi: fast and accurate estimation of evolutionary distances between closely related genomes. Bioinformatics, 2015 vol. 31 (8) pp. 1169-1175.0 Bioinformatics, 2015 vol. 31 (8) pp. 1169-1175. DOI:10.1093/bioinformatics/btu815

  • MCL for REPIN population clustering:
    A J Enright, S Van Dongen, and C A Ouzounis. An efficient algorithm for large-scale detection of protein families. Nucleic Acids Research, 2002 vol. 30 (7) pp. 1575-158. DOI:10.1093/nar/30.7.1575

  • BLAST+ for identifying RAYT relatives in the different genomes:
    C Camacho, G Coulouris, V Avagyan, N Ma, J Papadopoulos, K Bealer, and T L Madden. BLAST+: architecture and applications. BMC Bioinformatics, 2009 vol. 10 (1) pp. 421-9. DOI:10.1186/1471-2105-10-421

  • MUSCLE for the alignment of identified RAYT sequences:
    R C Edgar. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 2004 vol. 32 (5). DOI:10.1093/nar/gkh340.

  • PhyML3 for generating RAYT phylogenies:
    S Guindon, J -F Dufayard, V Lefort, M Anisimova, W Hordijk, and O Gascuel. New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0. Systematic Biology, 2010 vol. 59 (3). DOI:10.1093/sysbio/syq010

Example datasets

Three example datasets are publicly available from zenodo:

  • Frederic Bertels, Carsten Fortmann-Grote, & Paul Rainey. REPIN population analysis in 4 Dokdonia genomes. Zenodo, 2020 DOI:10.5281/zenodo.4117576.
    Visit results

  • Frederic Bertels, Carsten Fortmann-Grote, & Paul Rainey. REPIN population analysis in 130 Neisseria meningitidis and N. gonorrhoeae genomes. Zenodo, 2020 DOI:10.5281/zenodo.4049437.
    Visit results

  • Frederic Bertels, Carsten Fortmann-Grote, & Paul B. Rainey. REPIN population analysis in 42 Pseudomonas chlororaphis genomes. Zenodo, 2020 DOI:10.5281/zenodo.4043614.
    Visit results

  • Julia Balk, Carsten Fortmann-Grote, & Frederic Bertels. REPIN population analysis in 49 Stenotrophomonas maltophilia genomes. Zenodo, 2020 10.5281/zenodo.6504123.
    Follow these links to visualize the results for various Stenotrophomonas maltophilia reference strains: Sm53, AA1, FDAARGOS_649, AB550, ISMMS3

  • Code availability

    The source code for all RAREFAN components is available in our github repository. A system independent package to run RAREFAN from the commandline can be downloaded from the github releases page.


    Note: RAREFAN is still under development. The results may be erroneous or not be displayed correctly. In case of issues or questions please do not hesitate to contact us at rarefan@evolbio.mpg.de .

    © 2020 - 2022 Max Planck Institute for Evolutionary Biology