LGT3STATE

A C++ shell program for the detection of LGT events

Overview

LGT3State is a program for inferring lateral gene transfer (LGT) in cases where there are two non-homologous genes, where each gene performs the same function (which is essential to the cell). Thus, there are three states which may characterize an organism: (a) it encodes gene A, (b) it encodes gene B, (c), it encodes both genes A and B.

The existence of organisms which encode both types of genes is thought to be evidence for lateral gene transfer., since this is thought to be a transition phase before loss of one of the genes (Doolittle et al. 2003). Alternatively, if the species tree dispays a disordered pattern of organisms coding gene A and those coding gene B, this is also an indication that lateral gene transfer may have occurred.

To test this hypothesis, the program LGT3State may be run. In essence, to test the hypothesis of LGT in a certain dataset, two models are tested for their fit to the data: (a) a model enabling LGT, (b) a null model which does not enable LGT, and hence assumes that only gene loss characterized the evolution of the two genes. By comparing the maximum log-likelihoods of the data under both models, it can be decided whether to retain or reject the null model of "loss only".

Downloading and compiling the program:

Current version is from 23.12.2008.
For support and questions please email: talp@post.tau.ac.il or sternadi@post.tau.ac.il

Windows

Download and save to your computer the LGT3State.exe executable file. This is a simple command line application that may be run from MS-DOS. Do not double click on the program from Windows Explorer since this will not work. See the section below on "Usage" for a list of the program's arguments.

Linux, Unix and other operating systems

1. Download the LGT3State source code.
2. In order to unzip and untar the files please type:

unzip LGT3State.tar.zip
tar -xvf LGT3State.tar

This will create the following directories:

libs/phylogeny
programs/LGT3State

3. In some operating systems, you may use the makefiles to compile the program. If this does not work, skip to item 4.
Make sure you are in the directory where you unzipped the files, and type

cd libs/phylogeny

Type

make

in order to run the Makefile.
Now, type

cd ../../programs/LGT3State

to get to the LGT3State directory. Type

make

in order to run the Makefile.
This will result in an executable file called LGT3State which will reside in the programs/LGT3State directory.
4. In some systems (such as Unix), the makefiles will not be operable. Thus, follow steps 1-2 and compile directly using g++:

  1. cd to the library where you unzipped the files.
  2. Type

    mv libs/phylogeny/* programs/LGT3State/

  3. cd to the LGT3State library

    cd programs/LGT3State

  4. To compile, type
    g++ -O3 -o LGT3State *.cpp
This will result in an executable file called LGT3State which will reside in the src/LGT3State directory.

Important note:
The usage of the source code should be only for compiling the LGT3State program.
To modify the code, or use parts of it for other purposes, permission is requested. Please contact Tal Pupko at talp@post.tau.ac.il or Adi Stern at sternadi@post.tau.ac.il

Usage:


LGT3State uses flags in the command line arguments:

For a full list of arguments available, type "LGT3State.exe -h"
The essential arguments are:
-t the species tree file name (Newick format).
-s the alignment (one position) file name, where 0,1, or 2 is specified for each taxa in the species tree .
-r name of the node for the tree to be rooted at (to obtain node names, run the program once without this option, and the program will print a tree with node names to the results file)
-n runs the null model
(see the Thymidylate synthase data below for an example species tree file and an alignment file)

Inferring LGT and interpreting the results file

In order to infer LGT for a certain dataset, we recommend performing the following stages:
  1. Contrasting the LGT model against the null model, by comparing the likelihood of both models. To do this, run the program twice (once with -n flag, and once without). The likelihood values will reside in the results file of each run.The null and the LGT models are not nested. A parametric bootstrap approach was used to assess when rejection of the null model should occur. Accordingly, a difference of over 0.3 or 3 points should result in rejection of the model with a P-value of 0.05 or 0.01, respectively.
  2. If the data significantly supports LGT: infer LGT at lineages of the tree. The first line of the results file contains the input species tree. Information regarding the inference of the method is printed as bootstrap values at each node. This information can be viewed with phylogeny viewers such as NJplot. At each node, the posterior probabilities are printed in the following order: gain geneA//gain gene B//loss gene A//loss gene B[nodeName]. Simulations studies show that posterior probabilities above 0.5 lead to a false positive rate of 0.008.

Thymidylate synthase dataset

Species tree
Alignment file
Results file

Citing the program:

Stern A., Mayrose I., Penn, O., Shaul S., Gophna U., Pupko T. An Evolutionary Analysis of Lateral Gene Transfer in Thymidylate Synthase Enzymes. Resubmitted to Sys Biol.

References:

Doolittle, W. F., Y. Boucher, C. L. Nesbo, C. J. Douady, J. O. Andersson, and A. J. Roger. 2003. How big is the iceberg of which organellar genes in nuclear genomes are but the tip? Philos Trans R Soc Lond B Biol Sci 358:39-57; discussion 57-8.