Using Modulefiles

From hpc_wiki
Jump to: navigation, search


What are Modulefiles

Modulefiles are files written in the Tool Command Language (Tcl) and are used by the module command to implement different environments for particular applications. Whenever a user on a Unix/Linux based system runs applications, compiles code, etc. what the user can succesfully do is determined by the current shell environment. The most common application of this concept is the Unix/Linux PATH shell environment. When a user types a command, that command is either referenced by a full path name: e.g. /usr/bin/some_dir/another_dir/command_utility or the path needs to be specified in the user's PATH shell environment. Another example is when a particular user compiles application code, often several libraries need to be accessible and their are Unix shell environment variables that direct the compiler where to look (e.g. LD_LIBRARY_PATH). On a moderately sized or large cluster this becomes somewhat complicated to keep track of all the libraries and commands available on the system. An example would be compile-time libraries: often on a HPC system their are several versions of the same library to fullfill all user's needs. If all libraries where available by default to all users there is a good chance that application code would access the incorrect versions and cause errors. Modulefiles can be loaded and unloaded to modify certain shell environment variables to make applications available to the user and not interfere with other applications.

On RC HPC managed systems we use modulefiles to control shell environment for any libraries, compilers, and software that are installed locally (not globally located by default). In addition, modulefiles contain useful information such as version numbers, URLs for webpages and manuals, and MAN page information. Further, modulefiles are a great way to see what software, libraries and compilers are available on our systems.

Listing available modefiles

To list the currentely available modulefiles on the system use the module command with the available subcommand:

$> module available

--------------------------------------------- /usr/share/Modules/software ----------------------------------------------
chemistry/autodock_vina genomics/clumpp         genomics/muscle         genomics/tabix
chemistry/gamess        genomics/cufflinks      genomics/phase          genomics/tablet
chemistry/gaussian/g03  genomics/distruct       genomics/phrap          genomics/tophat2
chemistry/gromacs       genomics/eigensoft      genomics/picard-tools   genomics/trf
data/cdf/        genomics/emboss         genomics/plink          genomics/trinity
data/hdf5/1.8.12        genomics/fasta36        genomics/qiime          genomics/vcftools
genomics/abyss          genomics/fastqc         genomics/repeatexplorer gnu/parallel
genomics/allpaths       genomics/fastx_toolkit  genomics/rsem           mae/elk/2.3.16
genomics/beagle         genomics/gatk           genomics/samtools       statistics/matlab
genomics/bedtools       genomics/hmmer          genomics/shapeit
genomics/blast          genomics/igv            genomics/soapdenovo
genomics/bowtie2        genomics/lastz          genomics/structure

-------------------------------------------- /usr/share/Modules/development --------------------------------------------
compilers/gcc/4.8.2     libraries/DFS/myhadoop  libraries/hdf5/1.8.12   mpi/mpich2/1.2.1
compilers/gcc/4.9.0     libraries/bupc/2.2      libraries/mkl/ mpi/mvapich2/1.9
compilers/intel/14.0    libraries/cdf/   mpi/intel/     mpi/openmpi/1.6.5

The output of the available subcommand informs the full path where the modulefiles are located (e.g. /usr/share/Modules/software) as well as a list of modulefiles in that particular location (e.g. gnu/parallel). Importantly, all modulefiles are sectioned by categories. For instance, all compiler modulefiles start with 'compilers/' (i.e. compilers/gcc/4.8.2). The module available command can be given words for matching, and will return modulefiles that start with the given word. For instance, if you want to list all MPI modulefiles:

$>module available mpi

-------------------------------------------- /usr/share/Modules/development --------------------------------------------
mpi/intel/ mpi/mpich2/1.2.1    mpi/mvapich2/1.9    mpi/openmpi/1.6.5

Further, you can list all gcc compilers:

$>module available compilers/gcc

-------------------------------------------- /usr/share/Modules/development --------------------------------------------
compilers/gcc/4.8.2 compilers/gcc/4.9.0

Showing what a modulefile does

Viewing the contents of a modulefile can be done using the module command with the show subcommand:

$> module show compilers/intel/14.0


module-whatis	 Name: Intel Compiler 
module-whatis	 Version: 14.0 
module-whatis	 Category: compiler, runtime support 
module-whatis	 Description: Intel Compiler Family (C/C++/Fortran for x86_64) 
module-whatis	 URL: 
prepend-path	 PATH /shared/software/intel/bin 
prepend-path	 MANPATH /shared/software/intel/man 
prepend-path	 LD_LIBRARY_PATH /shared/software/intel/lib/intel64  

the modulefile starts with the location of the modulefile (/usr/share/Modules/development/compilers/intel/14.0). The next few lines module-whatis gives information about version numbers, categories for tools, URLs for more information. If you only want to see these first few lines you can use the whatis subcommand:

$> module whatis compilers/intel/14.0

compilers/intel/14.0 : Name: Intel Compiler
compilers/intel/14.0 : Version: 14.0
compilers/intel/14.0 : Category: compiler, runtime support
compilers/intel/14.0 : Description: Intel Compiler Family (C/C++/Fortran for x86_64)
compilers/intel/14.0 : URL:

Returning back to the actual modulefile, after the whatis lines you have specific Tcl commands that modify the environment. In this example we have two types of commands: 1) setenv which creates a shell environment variable (INTEL_LICENSE_FILE in this case). 2) prepend-path statement which modify a current shell environment variable. In this case the first prepend-path modifies the PATH environment variable by adding /shared/software/intel/bin to make that directory available to the user. More information about Tcl commands found in modulefiles can be found in the modulefile manpage (man modulefile).

Loading and Unloading modulefiles

Loading and unloading modulefiles can be done using the module command the subcommands load and unload. In this example, the Intel C compiler is named icc:

$> icc
-bash: icc: command not found

$> module load compilers/intel
$> icc
icc: command line error: no files specified; for help type "icc -help"

$> module unload compilers/intel
$> icc
-bash: icc: command not found

As shown in the above command, initially the icc commands could not be found. After loading the module, icc could be found (although I gave it no input so it gave me an error), and as expected after unloading the module icc could not be found anymore. For more information about module subcommands see the module manpage (man module)

Using modulefiles through qsub

Modulefiles can be used through the scheduler just as on the command line. In your batch shell script (see Running Jobs for details) you can place the module load/unload or any other module subcommand directly before you executed commands.