Installation Summary for Grid Engine



To install Grid Engine and test the correct setup of its functionality proceed with the following tasks:

General Overview Create a Grid Engine administrator account and set up a service port

An administrator account must be specified. The administrator can be an existing user or a new user may be created for this task. This account will own all of the files and it is used to configure and maintain the cluster once the software is installed.
The administrator account must exist prior to installation. We recommend  'sgeadmin' as the administrator account belonging to the 'adm' group.

The software uses a tcp port for communication. All hosts in the cluster must use the same port number. The port number can be set in the following places:

  • NIS (Yellow Pages) services or NIS+ database

  • Add the following to the services database (the port number does not matter, it must be unused on your system and should be a reserved port)
    sge_commd          536/tcp     # communication port for Grid Engine
  • Or, add the above manually to the /etc/services file on each machine

Create a directory and unpack the distribution

As the Grid Engine administrator, do the following:

If you received the distribution in "pkgadd" format% mkdir <your_gridengine_root_directory>

Install the packages with "pkgadd" on your file server (all files will have the correct permissions and ownership)
or if you received the distribution in "tar.gz" format
  • Create a directory for Grid Engine. This directory must be accessible to all Grid Engine clients and execution hosts.

  • (e.g. /share/gridengine )
    % mkdir <sge_root>
    % cd <sge_root>
     
  • Unpack the distribution to this directory.

  • % gzip -dc sge_<version>_common.tar.gz | tar xvf -
    % gzip -dc sge_<version>_<arch>.tar.gz | tar xvf -      (repeat for all architectures you need)
  • Please verify the file permissions with the script

  • <sge_root>/util/setfileperm.sh
    (all Grid Engine directories and files should be owned by the administrator, some files need to be installed suid root)
    This script must run on a machine where user root has appropriate permissons to chown/chmod file.
    This script not necessarily need to run on the qmaster machine.

Additional information before installing

  • Grid Engine must be installed as root

  • The Grid Engine installation program needs to be run as root in order to start the daemons. Root does NOT need write permission on the fileserver. Once Grid Engine is installed, the administrator can handle all day to day operations.
     
  • Machine rebooting

  • The machines DO NOT need to be rebooted as part of the Grid Engine installation.
     
  • It may be more convenient to have a file with the list of hosts that will be installed. The format for this file is one hostname per line. The names may also be typed in manually when the installation prompts.

  •  
  • If any stty commands exist in the users' startup scripts, jobs submitted to Grid Engine may fail as there is no terminal associated with a Grid Engine batch job. If there are stty commands, one of the following must be done:
    • Remove all stty commands (and commands accessing a tty, like e.g. "biff") from the login files
    • Bracket the stty commands with an 'if' statement which checks for a terminal before executing. For example:

    • #!/bin/csh
      tty -s             # checks terminal status
      if ($status == 0)  # succeeds if a terminal is present
          <place all stty commands in here>
      endif

Install Grid Engine

The installation is a two step process. First, the Grid Engine files are installed and configured on the master. Then, a small installation is  done on each execution host to configure and start the daemons, and to add automatic daemon startup to the init area. This requires logging on to each execution host as root and manually running the install program. Alternatively, if there is a secure machine with root rsh access to all machines, the execution host install can be done from a single machine.
  • Step One - Install the master host

  • As root, on the master host, run:
    % ./install_qmaster            (This is a shortcut for ./inst-sge -fast -m)
    This will install the Grid Engine master.
     
  • Step Two - Install execution hosts

  • As root on the execution host machines, run:
    % ./install_execd                (This is a shortcut for ./inst-sge -fast -x)
    This will install the Grid Engine execution daemon.
The installation programs start the Grid Engine daemons, so at the completion of a successful install, Grid Engine is up and running. If  the     master host will also be an execution host execute Step Two also on the master machine.

Verify installation

After the installation is completed, the installation can be verified. There are some sample scripts in $SGE_ROOT/examples/jobs.
First source the proper settings file to setup the Grid Engine environment:
 
  •  C-shell

  •  % source $SGE_ROOT/default/common/settings.csh
  • Bourne shell

  • $ .  $SGE_ROOT/default/common/settings.sh
Then, to verify Grid Engine is accepting jobs, execute the following:
% qsub $SGE_ROOT/examples/jobs/sleeper.sh
You should see output similar to the following:
% qsub $SGE_ROOT/examples/jobs/sleeper.sh
your job 1 ("Sleeper") has been submitted
Verify that all of the queues have been installed properly by running the following:
% qstat -f   (full listing of the queues)

Using Grid Engine

The main submit commands are qsub, qrsh and qtcsh. See the man pages for submit(1) and qtcsh(1) for more details.
  • qsub

  • In general, qsub is used for traditional batch submit, that is where I/O is directed to a file. Note that qsub only accepts shell scripts, not executable files. There is an application script, qs, which will allow qsub to accept executable files directly.
  • qrsh

  • Qrsh acts similar to the rsh command, except that a host name is not given. Instead, a shell script or an executable file is run, potentially on any node in the cluster. I/O is directed back to the submitter's terminal window. By default if the job cannot be run immediately, qrsh will not queue the job. Using the '-now no' flag to qrsh will allow jobs to queue. Note that I/O can be redirected with the shell redirect operators. For example, to run the uname -a command:
    % qrsh uname -a
    The uname of some machine the scheduler selects in the cluster will then be displayed on the submitting terminal. To redirect the output,
    % qrsh uname -a > /tmp/myfile
    The output from uname will be written to /tmp/myfile on the submitting host. To allow the command to queue:
    % qrsh -now no uname -a
    If a suitable host is not immediately available the command will block until a suitable host is available. At that time, the command output will be displayed on the submitting terminal. See the qrsh(1) man page for more details.
  • qtcsh

  • Grid Engine contains a modified tcsh, qtcsh which will automatically submit jobs listed in a task file to the cluster. See the qtcsh(1) man page for more details.