Power Cluster

Power is a Linux cluster system running CentOS (at the moment, most of the nodes are running version 7.3, others 6.8). The cluster consists of a single head node (power8), and more than 230 compute nodes (some with 16GB, others with 36GB or even 250GB memory) 8 to 32 cores each. Users belonging to netgroup 'power' can login and run their batch jobs on it. The Faculty Computer Coordinators can change their netgroup from general to power.

Users jobs are executed on the compute nodes (compute0-0 - compute0-235) under control of a queuing system (PBSPRO). Users are able to logon to the head node, power, via ssh (where their home directory is mounted from the CC filer, the same as on the other CC servers) and submit their jobs to the batch system.

  1. Start by :
    ssh power8 -l username
  2. Create a batch job script, for example, file named script that contains the following lines:
    cd executables
  3. Send the script to be executed in one of the existing queues, for example:
    qsub -q short script
  4. You can see the status of your executing jobs by executing:
    qstat -u your-username
  5. You can see the status of all the executing jobs by executing:
  6. To see the current available queues and their cputime and memory limits, execute:
    qstat -q

    Some of the queues are private, accessible to a predefined group of users; other are public, open to all the users of power.
    The publuc queues are: short, inf, hugemem, parallel, bigmem.
    More detailed information on any queue limits may be viewed by:
    qmgr -c "list queue queuename"
    For example:
    qmgr -c "list queue short"
    Default queue limits are enforced unless specified otherwise (up to max values) on 'qsub' command, for example:
    qsub -q hugemem -lpmem=2000mb,pvmem=3000mb script-name
  7. The standard output and standard error files will be written at the end of the execution to files in your home directory: script.o#n and script.e#n (where #n is the job number given to your job by the batch queueing system).
  8. Interactive sessions (line mode) are enabled too - via
    'qsub -I' command - which enables execution of the interactive jobs under the PBS queuing system.
  9. Interactive sessions with X:
    qsub -I -X -q 'queuename'
    But - if you want to run matlab via an interactive session, -X slows the matlab execution. The workaround (necessary only in case of matlab) is to replace -X by -v DISPLAY=....
    qsub -I -v DISPLAY=your.machine.name:0
    xhost+ power (on your.machine.name)
  10. Use the
    qdel job_id to delete job from the queue
  11. Parallel jobs can be executed in the cluster - using up to 8 cores for a job. For example, jobs compiled with mpich can be submitted with the following command:
    qsub -l nodes=2:ppn=8 -q parallel your-script-filename
  12. Multithreaded matlab jobs can be submitted with the following command:
    qsub -l nodes=1:ppn=8 -q parallel your-matlab-script-name
  13. The Environment Modules package provides for the dynamic modification of a userís environment via modulefiles.Typically modulefiles instruct the module command to alter or set shell environment variables such as PATH, MANPATH, etc. Modules are useful in managing different versions of applications.
    Useful commands:
    1. module avail (lists the available modules on the system)
    2. module load intel/ifort10 (loads the appropriate module and enables to use ifort version 10 without specifying the path to its binaries and libraries)
    3. module list (lists the loaded modules)
    4. module unload intel/ifort10 (unloads the loaded module)

  Tel Aviv University Tel Aviv University