Thank you for using our codon based evolutionary model accounting for variable degrees of DNA/RNA level selection forces. For assistance and/or suggestions please contact Prof. Tal Pupko at talp@post.tau.ac.il The following points explain how to compile, use, and interpret the results of our program. 1. Downloading and compiling Download the multilayer.tar.gz compressed directory from: http://www.tau.ac.il/~talp/multilayer.tar.gz Unzip and untar it: tar -zxvf multilayer.tar.gz Inside you will find two directories: libs and programs. Within libs you will find a directory named: phylogeny and within programs you will find a directory named: kaki and a file named: Makefile.generic. phylogeny is a static library and kaki, which depends on phylogeny, creates the executable. If you are using windows you will have to create a project and define depndence of kaki to phylogeny. If you are using unix/linux do not change the structure of the multilayer directory as the dependencies between kaki and phylogeny rely on it. cd to phylogeny and use the make command to compile it. When the compilation is done cd to kaki and again use the make command to compile it. Once the compilation is finished an executable named kaKi will be created. 2. Usage The parameters to kaKi are passed through a parameters file which is given as argument when the executable is invoked (e.g., ./kaKi ./params.txt). Inside the kaki directory you will find a file named: templete.params. Edit it according to the explanations provided within it. 3. Results and their interpretation During execution some messages are printed to the standard output. In addition, a log file is created into which additional information is printed. Keep the log file as it facilitates debugging. Two additional files are created at the end of a successful execution: (1) TheTree.txt - this is an output tree, which will have identical topology to the input tree but different branch lengths if you chose to optimize them by defining so in the parameters file. (2) kaks4s.res - this file holds all the results (an example named example.kaks4s.res is included in the kaki directory). Within kaksr4s.res the important stuff are: - a line that starts with "#Tree log-likelihood". This is the maximum log-likelihood of the data given the model and tree. - Columns 4,5,6 and 8 (under the headers: Position_1_Rate, Position_2_Rate, Position_3_Rate and Post>1, in line 19 respectively). These are the codon-site substitution rate values and posterior probability of an amino-acid site having Ka/Ks > 1. See below for further explanations. 4. Testing for DNA/RNA level rate variability and positive selection In order to test for DNA/RNA level rate variability the maximum lilelihood obtained by two models need to be compared using the likleihood ratio test (LRT). First run the null model, which does not account for DNA/RNA level rate variability, by setting the parameters: _numberOfPositionRateCategories and _inferenceNumPositionRateCategories to 1. Then run the alternative model, which accounts for DNA/RNA level rate variability, by setting the parameters: _numberOfPositionRateCategories and _inferenceNumPositionRateCategories to 4 (recommended). In both cases do not allow for positive selection by setting the _bOptimizeOmega parameter to 0. To perform the LRT simply find the chi-square probability of 2*(log_likelihood_of_alternative_model - log_likelihood_of_null_model) with one degree of freedom. The result is the p-value, which if lower than 0.05 means that the null hypothesis of DNA/RNA level rate homogeneity may be rejected. If your aim is to analyze selection forces operating on synonymous sites the position rate columns in the kaks4s.res file hold codon-site-specific rate values. The lower the rate the stronger the purifying selection operating on that site. If your aim is to test for positive selection you will need to run the model that accounts for DNA/RNA level rate variability and allows for positive selection by setting the parameters: _numberOfPositionRateCategories and _inferenceNumPositionRateCategories to 4 (recommended) and the _bOptimizeOmega parameter to 1. Once this execution is successfully done perform an LRT by obtaining the chi-square probability of 2*(log_likelihood_of_DNA/RNA_rate_variability_with_positive_selection_model - log_likelihood_of_DNA/RNA_rate_variability_with_no_positive_selection_model) with one degree of freedom. If the p-value is lower than 0.05 it means positive selection is operating on this gene. The specific sites on which positive selection is operating are those for which the value in the Post>1 column in the kaks4s.res file are high (higher than 0.95 as a recommendation). If DNA/RNA level rate homogeneity cannot be rejected you will need to run the models that do not account for DNA/RNA level rate variability in your test for positive selection. For this, you will need to perform two executions where in both the parameters: _numberOfPositionRateCategories and _inferenceNumPositionRateCategories are set to 1. In the execution that does not allow for positive selection set the _bOptimizeOmega parameter to 0, and in the exectution that allows for positive selection set the _bOptimizeOmega parameter to 1. Once these executions are successully done perform an LRT by obtaining the chi-square probability of 2*(log_likelihood_of_positive_selection_model - log_likelihood_of_no_positive_selection_model) with one degree of freedom. If the p-value is lower than 0.05 it means positive selection is operating on this gene. The specific sites on which positive selection is operating are those for which the value in the Post>1 column in the kaks4s.res file are high (higher than 0.95 as a recommendation).