The RAREFAN webserver aims to identify and analyze RAYT transposases and their associated REPINs in bacterial species. The input to the server is a selection of closely related strains (<5% divergence). There are three example datasets from Neisseria, E. coli, and Pseudomonas chlororaphis. Neisseria and E. coli both contain Group 2 RAYTs (mainly found in enterobacteria), whereas P. chlororaphis contains Group 3 RAYTs (mainly found in P. putida, P. fluorescens and P. syringae). For more explanation see Bertels, Gallie and Rainey (2017). Our service provides an analysis of REPIN population size, how it relates to REPIN replication rate and the presence and absence of RAYTs across all submitted genomes.
More information about the RAREFAN algorithm, example applications, and usage instructions are provided in our article in Peer Community Journal (DOI:10.24072/pcjournal.244).
RAREFAN uses the following tools:
andi for tree building:
B Haubold, F Klötzl, and P Pfaffelhuber. andi: fast and accurate estimation of evolutionary distances between
closely related genomes. Bioinformatics, 2015 vol. 31 (8) pp. 1169-1175.0
Bioinformatics, 2015 vol. 31 (8) pp. 1169-1175. DOI:10.1093/bioinformatics/btu815
MCL for REPIN population clustering:
A J Enright, S Van Dongen, and C A Ouzounis. An efficient algorithm for large-scale detection of protein
families. Nucleic Acids Research, 2002 vol. 30 (7) pp. 1575-158.
DOI:10.1093/nar/30.7.1575
BLAST+ for identifying RAYT relatives in the different genomes:
C Camacho, G Coulouris, V Avagyan, N Ma, J Papadopoulos, K Bealer, and T L Madden. BLAST+: architecture and
applications. BMC Bioinformatics, 2009 vol. 10 (1) pp. 421-9.
DOI:10.1186/1471-2105-10-421
MUSCLE for the alignment of identified RAYT sequences:
R C Edgar. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 2004
vol. 32 (5).
DOI:10.1093/nar/gkh340.
PhyML3 for generating RAYT phylogenies:
S Guindon, J -F Dufayard, V Lefort, M Anisimova, W Hordijk, and O Gascuel.
New Algorithms and Methods to Estimate Maximum-Likelihood Phylogenies: Assessing the Performance of PhyML 3.0.
Systematic Biology, 2010 vol. 59 (3).
DOI:10.1093/sysbio/syq010
The following datasets are publicly available from zenodo. Follow the links below to visualize the respective results obtained with the current version of RAREFAN or the first released version which corresponds to the zenodo deposit.
Frederic Bertels, Carsten Fortmann-Grote, & Paul Rainey.
REPIN population analysis in 4 Dokdonia genomes.
Zenodo, 2020 DOI:10.5281/zenodo.4117576.
View results: current version -- first version
Frederic Bertels, Carsten Fortmann-Grote, & Paul Rainey.
REPIN population analysis in 130 Neisseria meningitidis and N. gonorrhoeae genomes.
Zenodo, 2020 DOI:10.5281/zenodo.4049437.
View results: current version -- first version
Frederic Bertels, Carsten Fortmann-Grote, & Paul B. Rainey.
REPIN population analysis in 42 Pseudomonas chlororaphis genomes.
Zenodo, 2020 DOI:10.5281/zenodo.4043614.
View results: current version -- first version
Julia Balk, Carsten Fortmann-Grote, & Frederic Bertels.
REPIN population analysis in 49 Stenotrophomonas maltophilia genomes.
Zenodo, 2020 DOI:10.5281/zenodo.6504123.
View results for various reference strains: current version: Sm53 AA1 FDAARGOS_649, AB550, ISMMS3 first version: Sm53 AA1 FDAARGOS_649 AB550 ISMMS3
The source code for all RAREFAN components is available in our github repository. A system independent package to run RAREFAN from the commandline can be downloaded from the github releases page.
Please use the following citation to refer to RAREFAN:
Carsten Fortmann-Grote, Julia v. Irmer, & Frederic Bertels. RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomes.
Peer Community Journal, 2023, 3 DOI:10.24072/pcjournal.244