Reserving memory for your matlab job.

Even if you are careful to not submit a job that will blow through the entire memory on a machine, you can run into problems if the size of the job fluctuates and there are other jobs running on that node. One solution to this is to reserve an appropriate amount of memory.

Here is an example script for bsub (dct_bsub.job) that starts the dct_example script from the earlier example that used only 4 workers.:

#!/bin/bash
#BSUB -J DCTexample
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -W 00:30
#BSUB -q debug
#BSUB -n 4
#
### Run job
# cd not needed if CWD is the right one when this is submitted
# In other words, cd to the dir
# Note: it IS needed to module load matlab before submitting this
matlab < dct_example.m >& dct_example.log

The bhosts command will let us look at how much memory is used:

login4 381% bsub < dct_bsub.job 
Job <1084192> is submitted to queue .

login4 383% bjobs
JOBID     USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
1084192   agleaso RUN   debug      login4.pega 4*n204.pega DCTexample Jul 16 16:15

login4 385% bhosts -l n204
HOST  n204.pegasus.edu
STATUS           CPUF  JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV DISPATCH_WINDOW
ok              60.00     -     16      4      4      0      0      0      -

 CURRENT LOAD USED FOR SCHEDULING:
                r15s   r1m  r15m    ut    pg    io   ls    it   tmp   swp   mem  slots
 Total           0.0   0.0   0.0    0%   0.0     0    0 86336  199M    0M 28.2G     12
 Reserved        0.0   0.0   0.0    0%   0.0     0    0     0    0M    0M    0M      -

...and more after this that I am cutting out...

This job uses hardly any memory so you can see 28.2G is still available for use, even when this job is running. This could be a problem if the early part of your job doesn't use much memory, but then you try to allocate more later on. If some other job starts after yours, but before yours needs a lot of memory then the machine will start paging or maybe crash. To avoid this, reserve the amount of ram you will need. For example, let's say each of my 4 workers needs 4 GB of ram, or 16 GB total for the job. I can reserve this memory at the start of run time (and guarantee that my job does not start on a node that won't have this much free ram) by using the -R flag to BSUB, as follows:

#!/bin/bash
#BSUB -J DCTexample
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -W 00:30
#BSUB -q debug
#BSUB -n 4
#BSUB -R "rusage[mem=16384]"
#
### Run job
# cd not needed if CWD is the right one when this is submitted
# In other words, cd to the dir
# Note: it IS needed to module load matlab before submitting this
matlab < dct_example.m >& dct_example.log

Now, the bhosts command will show that 16 GB of ram is reserved and only 12.3 is free:

login4 392% bsub < dct_bsub.job
Job <1084199> is submitted to queue .

login4 393% bjobs
JOBID     USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
1084199   agleaso RUN   debug      login4.pega 4*n202.pega DCTexample Jul 16 16:23

login4 395% bhosts -l n202
HOST  n202.pegasus.edu
STATUS           CPUF  JL/U    MAX  NJOBS    RUN  SSUSP  USUSP    RSV DISPATCH_WINDOW
ok              60.00     -     16      4      4      0      0      0      -

 CURRENT LOAD USED FOR SCHEDULING:
                r15s   r1m  r15m    ut    pg    io   ls    it   tmp   swp   mem  slots
 Total           0.0   0.0   0.0    0%   0.0     0    0 86336  199M    0M 12.3G     12
 Reserved        0.0   0.0   0.0    0%   0.0     0    0     0    0M    0M   16G      -

...and other things we don't need....

By using this technique, perhaps in combination with limiting the number of workers or spreadng the jobs over many nodes, you should have fewer problems with memory conflicts on pegasus2.