Rate4Site (VERSION 2.01)

Rate4Site (VERSION 2.01)
Rate4Site is a program for detecting conserved amino-acid sites by computing the relative evolutionary rate for each site in the multiple sequence alignment (MSA).

Download the program (for WINDOWS):
Current version is from 11.2006.
For support and questions please email me: itaymay@post.tau.ac.il
rate4site (version 2.01; you may need to add the extension .exe before running)
seq.aln (Clustal file format).
tree.txt (Newick tree file format).

You can try the program by typing rate4site.exe -s seq.aln -t tree.txt

Source code and copyrights:
The use of the Rate4Site program is free for academic end-users solely for non-commercial research purposes.
Other users - please see the terms and conditions.

The most updated installation package and source files can be obtained through ftp://rostlab.org/rate4site/.
Follow the instructions in the provided INSTALL file.

Alternatively, source code (C++) for UNIX and LINUX is also available for download here: [rate4site.3.2.source.zip].
Note, that these may be somewhat less updated than the above ftp site.
The makefile within can be used to compile the executable (by typing the make command). Alternatively, type: g++ -o rate4site.exe -O3 *.cpp.
The Rate4Site program may suffer from underflow problems when a large number of sequences are used (typically more than 200). To bypass this problem Rate4Site can be compiled using the Makefile_slow file. The executable produced when compiling with this option can handle a larger number of sequences but is slower.
Windows executable: rate4site_slow.exe (You may need to add the extension .exe before running)

If there are problems with the compilations (occasionally, with old version of g++) - please email me and I'll try to help. To modify the code, or use parts of it for other purposes, permission is requested. Please contact Tal Pupko at talp@post.tau.ac.il

Overview:
The rate of evolution is not constant among amino acid sites: some positions evolve slowly and are commonly referred to as "conserved", while others evolve rapidly and are referred to as "variable". The rate variations correspond to different levels of purifying selection acting on these sites. The purifying selection can be the result of geometrical constraints on the folding of the protein into its 3D structure, constraints at amino acid sites involved in enzymatic activity or in ligand binding or, alternatively, at amino acid sites that take part in protein-protein interactions. Rate4Site calculates the relative evolutionary rate at each site using a probabilistic-based evolutionary model. This allows taking into account the stochastic process underlying sequence evolution within protein families and the phylogenetic tree of the proteins in the family. The conservation score at a site corresponds to the site's evolutionary rate.

Methodology:
The sole obligatory input to Rate4Site is an MSA file. The program then computes a phylogenetic tree that is consistent with the available MSA (the user can also input a pre-calculated tree). It then calculates the relative conservation score for each site in the MSA. This is carried out using either an empirical Bayesian method or a maximum likelihood method (Pupko et al., 2002). The differences between the two methods are explained in details in Mayrose et al (2004).

In citing the Rate4Site program please refer to:

Mayrose, I., Graur, D., Ben-Tal, N., and Pupko, T. 2004. Comparison of site-specific rate-inference methods: Bayesian methods are superior. Mol Biol Evol 21: 1781-1791. [pdf] [abs]

Usage:

Flag	Description	Default
-s [MSA file]	The input sequence file name. The following formats are supported: Mase, Molphy, Phylip, Clustal, Fasta	Obligatory
-t [tree file]	The input tree file name (in Newick format)	An NJ tree is constructed
-o [output file]	The results output file	r4s.res
-a [sequence name]	Reference sequence name in the MSA. The conservation scores are printed based on the amino-acids in this sequence.	First sequence in the MSA
-i [rate inference method]	Rate inference method flag: -Im = rates are inferred using the maximum likelihood method -Ib = rates are inferred using the empirical Bayes method	-Ib
-k [categories number]	The number of discrete Gamma categories	16
-m [evolutionary model]	The following amino-acids models are supported: DAY (-md), JTT (-mj), REV (-mr), aaJC (-ma), LG (-Ml), WAG (-Mw) . For nucleotides, the following models are supported: JC (-mn), HKY (-Mh), Tamura92 (-Mt), GTR (-Mg).	-mj
-b [branch-lengths optimization]	Branch lengths optimization flag: -bn = no Branch lengths optimization -bh = optimization using a homogenous model (no among-site-rate-variation) -bg = optimization using a Gamma model	-bg
-z [tree constructing method]	-zj = Neighbor-joining tree with Jukes-Cantor distances -zn = Neighbor-joining tree with maximum likelihood distances	-zn
-h	help