Genome-Scale Identification of Legionella pneumophila
Effectors using a Machine Learning Approach

David Burstein, Tal Zusman, Elena Degtyar, Ram Viner, Gil Segal, and Tal Pupko

Computer source files used for executing the machine-learning scheme described in the manuscript



Main archive file: codeDistribution.zip

This zip file contains the following:

buildSets.sh

The base shell file, creates the csv file which is used as the input to the Weka software package

*.pl

Perl script file (required by the buildSets.sh)

all.data.csv

Data collected for all L. pneumophila ORFs (that is not dependant on which ORF are effectors)

*.blastp

Blast all against all of L. pneumophila ORFs

*.lpg

Input examples for buildSets.sh: list of ORFs (lpg annotation)

*.sample.csv

Output sample file



Auxiliary code archive file (33 Perl scripts; 5 packages): auxiliaryCode.zip

This archive file contain additional Perl scripts that were used to generate and collect the data. These scripts are not requires to run the above given scripts, but were used in the progress of the research (many of them were applied to generate all.data.csv file supplied above).



Note: This code was designed to work on Linux, in which BLAST is installed (can be installed from here).
In addition, in order to execute the Perl scripts, Perl should be installed on the machine running the scripts. Information about downloading Perl can be found here.