Distributed Computing Toolbox runs only on one physical machine

Earlier I showed an example of how to use the distributed computing toolbox to run a job on pegasus2 using up to 12 workers. That works fine, but one important limitation is that when using this method, all of the workers will run on same physical node whether or not that is what you want them to do. Here is the evidence that is the case:

Here is the code we'll run for our simple job. (I'll name this file dct_example_nwkr.m) Note that for the purpose of illustration I have limited the number of workers to Nworkers = 4 and will print the hostname of the machine that is running the code within the loop.

%=====================================================================
% DCT Example: Do nothing on Nworkers. Print datestamp to show
% how it's going.
%
% Art Gleason July 14, 2014
%=====================================================================
Nworkers = 4; % make sure this matches the number in dct_bsub.job
              % on the line #BSUB -n 
              % e.g. if Nworkers = 4 then use #BSUB -n 4
              % if Nworkers = 0 or 1 then use #BSUB -n 1

N = 200;       % bump this up if you want a longer job with many workers

%---open Nworkers, 0=serial, 12=max number possible---
if( Nworkers > 12 )
  Nworkers = 12;
elseif( Nworkers < 0 )
  Nworkers = 0;
end

if( Nworkers > 0 )
  matlabpool('open',Nworkers);
end

parfor(ix=1:N, Nworkers)
  [~, thehostname] = system('hostname');
  ixstamp = sprintf('Iteration %d at %s on %s\n', ix, datestr(now), thehostname);
  disp(ixstamp);
  donothing(5);
end

if( Nworkers > 0 )
  matlabpool('close');
end

If we submit this job with the following bsub commands, forcing it to run all on one node (span[ptile=4]) we get the expected output.

#!/bin/bash
#BSUB -J DCTexample
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -W 00:15
#BSUB -q general
#BSUB -n 4
#BSUB -R "rusage[mem=4096]"
#BSUB -R "span[ptile=4]"
#BSUB -M 4096
#
### Run job
# cd not needed if CWD is the right one when this is submitted
# In other words, cd to the dir
# Note: it IS needed to module load matlab before submitting this
matlab < dct_example_nwkr.m >& dct_example.log

Here is the bjobs output (note job running on n018):

JOBID     USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
1120934   agleaso RUN   general    login4.pega 4*n018.pega DCTexample Aug  6 13:09

And here is the output (dct_example.log). Note everything printed from host n018, as expected.

Warning: No display specified.  You will not be able to display graphics on the screen.
Warning: No window system found.  Java option 'MWT' ignored.

                            < M A T L A B (R) >
                  Copyright 1984-2013 The MathWorks, Inc.
                    R2013a (8.1.0.604) 64-bit (glnxa64)
                             February 15, 2013

No window system found.  Java option 'MWT' ignored.
 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Starting matlabpool using the 'local' profile ... Warning: Found 8 pre-existing
 communicating job(s) created by matlabpool that
are running. You can use 'matlabpool close force local' to remove all jobs
created by matlabpool. 
> In InteractiveClient>InteractiveClient.pRemoveOldJobs at 426
  In InteractiveClient>InteractiveClient.start at 260
  In MatlabpoolHelper>MatlabpoolHelper.doOpen at 363
  In MatlabpoolHelper>MatlabpoolHelper.doMatlabpool at 137
  In matlabpool at 139 
connected to 4 workers.
>> >> Iteration 68 at 06-Aug-2014 13:10:09 on n018


Iteration 126 at 06-Aug-2014 13:10:09 on n018


Iteration 101 at 06-Aug-2014 13:10:09 on n018


Iteration 34 at 06-Aug-2014 13:10:09 on n018


Iteration 67 at 06-Aug-2014 13:10:14 on n018


Iteration 125 at 06-Aug-2014 13:10:14 on n018


Iteration 100 at 06-Aug-2014 13:10:14 on n018


Iteration 33 at 06-Aug-2014 13:10:14 on n018


Iteration 66 at 06-Aug-2014 13:10:19 on n018


Iteration 124 at 06-Aug-2014 13:10:19 on n018


Iteration 99 at 06-Aug-2014 13:10:19 on n018


Iteration 32 at 06-Aug-2014 13:10:19 on n018


Iteration 65 at 06-Aug-2014 13:10:25 on n018


Iteration 123 at 06-Aug-2014 13:10:25 on n018


Iteration 98 at 06-Aug-2014 13:10:25 on n018


Iteration 31 at 06-Aug-2014 13:10:25 on n018


Iteration 64 at 06-Aug-2014 13:10:30 on n018


Iteration 122 at 06-Aug-2014 13:10:30 on n018


Iteration 97 at 06-Aug-2014 13:10:30 on n018


Iteration 30 at 06-Aug-2014 13:10:30 on n018


Iteration 63 at 06-Aug-2014 13:10:35 on n018


Iteration 121 at 06-Aug-2014 13:10:35 on n018


Iteration 96 at 06-Aug-2014 13:10:35 on n018


Iteration 29 at 06-Aug-2014 13:10:35 on n018


Iteration 62 at 06-Aug-2014 13:10:40 on n018


Iteration 120 at 06-Aug-2014 13:10:40 on n018


Iteration 95 at 06-Aug-2014 13:10:40 on n018


Iteration 28 at 06-Aug-2014 13:10:40 on n018


Iteration 61 at 06-Aug-2014 13:10:45 on n018


Iteration 119 at 06-Aug-2014 13:10:45 on n018


Iteration 94 at 06-Aug-2014 13:10:45 on n018


Iteration 27 at 06-Aug-2014 13:10:45 on n018


Iteration 60 at 06-Aug-2014 13:10:50 on n018


Iteration 118 at 06-Aug-2014 13:10:50 on n018


Iteration 93 at 06-Aug-2014 13:10:50 on n018


Iteration 26 at 06-Aug-2014 13:10:50 on n018


Iteration 59 at 06-Aug-2014 13:10:55 on n018


Iteration 117 at 06-Aug-2014 13:10:55 on n018


Iteration 92 at 06-Aug-2014 13:10:55 on n018


Iteration 25 at 06-Aug-2014 13:10:55 on n018


Iteration 58 at 06-Aug-2014 13:11:00 on n018


Iteration 116 at 06-Aug-2014 13:11:00 on n018


Iteration 91 at 06-Aug-2014 13:11:00 on n018


Iteration 24 at 06-Aug-2014 13:11:00 on n018


Iteration 57 at 06-Aug-2014 13:11:05 on n018


Iteration 115 at 06-Aug-2014 13:11:05 on n018


Iteration 90 at 06-Aug-2014 13:11:05 on n018


Iteration 23 at 06-Aug-2014 13:11:05 on n018


Iteration 56 at 06-Aug-2014 13:11:10 on n018


Iteration 114 at 06-Aug-2014 13:11:11 on n018


Iteration 89 at 06-Aug-2014 13:11:10 on n018


Iteration 22 at 06-Aug-2014 13:11:11 on n018


Iteration 55 at 06-Aug-2014 13:11:16 on n018


Iteration 113 at 06-Aug-2014 13:11:16 on n018


Iteration 88 at 06-Aug-2014 13:11:16 on n018


Iteration 21 at 06-Aug-2014 13:11:16 on n018


Iteration 54 at 06-Aug-2014 13:11:21 on n018


Iteration 112 at 06-Aug-2014 13:11:21 on n018


Iteration 87 at 06-Aug-2014 13:11:21 on n018


Iteration 20 at 06-Aug-2014 13:11:21 on n018


Iteration 53 at 06-Aug-2014 13:11:26 on n018


Iteration 111 at 06-Aug-2014 13:11:26 on n018


Iteration 86 at 06-Aug-2014 13:11:26 on n018


Iteration 19 at 06-Aug-2014 13:11:26 on n018


Iteration 52 at 06-Aug-2014 13:11:31 on n018


Iteration 110 at 06-Aug-2014 13:11:31 on n018


Iteration 85 at 06-Aug-2014 13:11:31 on n018


Iteration 18 at 06-Aug-2014 13:11:31 on n018


Iteration 51 at 06-Aug-2014 13:11:36 on n018


Iteration 109 at 06-Aug-2014 13:11:36 on n018


Iteration 84 at 06-Aug-2014 13:11:36 on n018


Iteration 17 at 06-Aug-2014 13:11:36 on n018


Iteration 50 at 06-Aug-2014 13:11:41 on n018


Iteration 83 at 06-Aug-2014 13:11:41 on n018


Iteration 108 at 06-Aug-2014 13:11:41 on n018


Iteration 16 at 06-Aug-2014 13:11:41 on n018


Iteration 49 at 06-Aug-2014 13:11:46 on n018


Iteration 107 at 06-Aug-2014 13:11:46 on n018


Iteration 82 at 06-Aug-2014 13:11:46 on n018


Iteration 15 at 06-Aug-2014 13:11:46 on n018


Iteration 48 at 06-Aug-2014 13:11:51 on n018


Iteration 106 at 06-Aug-2014 13:11:51 on n018


Iteration 81 at 06-Aug-2014 13:11:51 on n018


Iteration 14 at 06-Aug-2014 13:11:51 on n018


Iteration 47 at 06-Aug-2014 13:11:56 on n018


Iteration 105 at 06-Aug-2014 13:11:56 on n018


Iteration 80 at 06-Aug-2014 13:11:56 on n018


Iteration 13 at 06-Aug-2014 13:11:56 on n018


Iteration 46 at 06-Aug-2014 13:12:02 on n018


Iteration 104 at 06-Aug-2014 13:12:02 on n018


Iteration 79 at 06-Aug-2014 13:12:01 on n018


Iteration 12 at 06-Aug-2014 13:12:02 on n018


Iteration 45 at 06-Aug-2014 13:12:07 on n018


Iteration 103 at 06-Aug-2014 13:12:07 on n018


Iteration 78 at 06-Aug-2014 13:12:07 on n018


Iteration 11 at 06-Aug-2014 13:12:07 on n018


Iteration 44 at 06-Aug-2014 13:12:12 on n018


Iteration 102 at 06-Aug-2014 13:12:12 on n018


Iteration 77 at 06-Aug-2014 13:12:12 on n018


Iteration 10 at 06-Aug-2014 13:12:12 on n018


Iteration 76 at 06-Aug-2014 13:12:17 on n018


Iteration 43 at 06-Aug-2014 13:12:17 on n018


Iteration 145 at 06-Aug-2014 13:12:17 on n018


Iteration 9 at 06-Aug-2014 13:12:17 on n018


Iteration 75 at 06-Aug-2014 13:12:22 on n018


Iteration 42 at 06-Aug-2014 13:12:22 on n018


Iteration 144 at 06-Aug-2014 13:12:22 on n018


Iteration 8 at 06-Aug-2014 13:12:22 on n018


Iteration 41 at 06-Aug-2014 13:12:27 on n018


Iteration 143 at 06-Aug-2014 13:12:27 on n018


Iteration 74 at 06-Aug-2014 13:12:27 on n018


Iteration 7 at 06-Aug-2014 13:12:27 on n018


Iteration 40 at 06-Aug-2014 13:12:32 on n018


Iteration 142 at 06-Aug-2014 13:12:32 on n018


Iteration 73 at 06-Aug-2014 13:12:32 on n018


Iteration 6 at 06-Aug-2014 13:12:32 on n018


Iteration 39 at 06-Aug-2014 13:12:37 on n018


Iteration 141 at 06-Aug-2014 13:12:37 on n018


Iteration 72 at 06-Aug-2014 13:12:37 on n018


Iteration 5 at 06-Aug-2014 13:12:37 on n018


Iteration 38 at 06-Aug-2014 13:12:42 on n018


Iteration 140 at 06-Aug-2014 13:12:42 on n018


Iteration 71 at 06-Aug-2014 13:12:42 on n018


Iteration 4 at 06-Aug-2014 13:12:42 on n018


Iteration 37 at 06-Aug-2014 13:12:47 on n018


Iteration 139 at 06-Aug-2014 13:12:48 on n018


Iteration 70 at 06-Aug-2014 13:12:47 on n018


Iteration 3 at 06-Aug-2014 13:12:48 on n018


Iteration 36 at 06-Aug-2014 13:12:53 on n018


Iteration 138 at 06-Aug-2014 13:12:53 on n018


Iteration 69 at 06-Aug-2014 13:12:52 on n018


Iteration 2 at 06-Aug-2014 13:12:53 on n018


Iteration 35 at 06-Aug-2014 13:12:58 on n018


Iteration 137 at 06-Aug-2014 13:12:58 on n018


Iteration 159 at 06-Aug-2014 13:12:57 on n018


Iteration 1 at 06-Aug-2014 13:12:58 on n018


Iteration 158 at 06-Aug-2014 13:13:03 on n018


Iteration 170 at 06-Aug-2014 13:13:03 on n018


Iteration 136 at 06-Aug-2014 13:13:03 on n018


Iteration 178 at 06-Aug-2014 13:13:03 on n018


Iteration 157 at 06-Aug-2014 13:13:08 on n018


Iteration 169 at 06-Aug-2014 13:13:08 on n018


Iteration 135 at 06-Aug-2014 13:13:08 on n018


Iteration 177 at 06-Aug-2014 13:13:08 on n018


Iteration 156 at 06-Aug-2014 13:13:13 on n018


Iteration 168 at 06-Aug-2014 13:13:13 on n018


Iteration 134 at 06-Aug-2014 13:13:13 on n018


Iteration 176 at 06-Aug-2014 13:13:13 on n018


Iteration 167 at 06-Aug-2014 13:13:18 on n018


Iteration 133 at 06-Aug-2014 13:13:18 on n018


Iteration 155 at 06-Aug-2014 13:13:18 on n018


Iteration 175 at 06-Aug-2014 13:13:18 on n018


Iteration 166 at 06-Aug-2014 13:13:23 on n018


Iteration 132 at 06-Aug-2014 13:13:23 on n018


Iteration 154 at 06-Aug-2014 13:13:23 on n018


Iteration 174 at 06-Aug-2014 13:13:23 on n018


Iteration 165 at 06-Aug-2014 13:13:28 on n018


Iteration 131 at 06-Aug-2014 13:13:28 on n018


Iteration 153 at 06-Aug-2014 13:13:28 on n018


Iteration 173 at 06-Aug-2014 13:13:28 on n018


Iteration 164 at 06-Aug-2014 13:13:33 on n018


Iteration 130 at 06-Aug-2014 13:13:33 on n018


Iteration 152 at 06-Aug-2014 13:13:33 on n018


Iteration 172 at 06-Aug-2014 13:13:33 on n018


Iteration 163 at 06-Aug-2014 13:13:39 on n018


Iteration 129 at 06-Aug-2014 13:13:39 on n018


Iteration 151 at 06-Aug-2014 13:13:38 on n018


Iteration 171 at 06-Aug-2014 13:13:39 on n018


Iteration 150 at 06-Aug-2014 13:13:43 on n018


Iteration 162 at 06-Aug-2014 13:13:44 on n018


Iteration 128 at 06-Aug-2014 13:13:44 on n018


Iteration 184 at 06-Aug-2014 13:13:44 on n018


Iteration 149 at 06-Aug-2014 13:13:48 on n018


Iteration 161 at 06-Aug-2014 13:13:49 on n018


Iteration 127 at 06-Aug-2014 13:13:49 on n018


Iteration 183 at 06-Aug-2014 13:13:49 on n018


Iteration 148 at 06-Aug-2014 13:13:54 on n018


Iteration 160 at 06-Aug-2014 13:13:54 on n018


Iteration 189 at 06-Aug-2014 13:13:54 on n018


Iteration 182 at 06-Aug-2014 13:13:54 on n018


Iteration 147 at 06-Aug-2014 13:13:59 on n018


Iteration 194 at 06-Aug-2014 13:13:59 on n018


Iteration 188 at 06-Aug-2014 13:13:59 on n018


Iteration 181 at 06-Aug-2014 13:13:59 on n018


Iteration 146 at 06-Aug-2014 13:14:04 on n018


Iteration 193 at 06-Aug-2014 13:14:04 on n018


Iteration 187 at 06-Aug-2014 13:14:04 on n018


Iteration 180 at 06-Aug-2014 13:14:04 on n018


Iteration 192 at 06-Aug-2014 13:14:09 on n018


Iteration 186 at 06-Aug-2014 13:14:09 on n018


Iteration 199 at 06-Aug-2014 13:14:09 on n018


Iteration 179 at 06-Aug-2014 13:14:09 on n018


Iteration 191 at 06-Aug-2014 13:14:14 on n018


Iteration 198 at 06-Aug-2014 13:14:14 on n018


Iteration 185 at 06-Aug-2014 13:14:14 on n018


Iteration 200 at 06-Aug-2014 13:14:14 on n018


Iteration 197 at 06-Aug-2014 13:14:19 on n018


Iteration 190 at 06-Aug-2014 13:14:19 on n018


Iteration 196 at 06-Aug-2014 13:14:24 on n018


Iteration 195 at 06-Aug-2014 13:14:29 on n018


>> >> Sending a stop signal to all the workers ... stopped.



This is all fine so far, but say you want to (or LSF decides to) assign your job to cpus on different physical nodes. Then what happens? We can force this situation by using span[ptile=1]:

#!/bin/bash
#BSUB -J DCTexample
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -W 00:15
#BSUB -q general
#BSUB -n 4
#BSUB -R "rusage[mem=4096]"
#BSUB -R "span[ptile=1]"
#BSUB -M 4096
#
### Run job
# cd not needed if CWD is the right one when this is submitted
# In other words, cd to the dir
# Note: it IS needed to module load matlab before submitting this
matlab < dct_example_nwkr.m >& dct_example.log

The bjobs output shows 1 cpu assiged on n178, n096, n025, n210::

JOBID     USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
1120938   agleaso RUN   general    login4.pega 1*n178.pega DCTexample Aug  6 13:19
                                               1*n096.pegasus.edu
                                               1*n025.pegasus.edu
                                               1*n210.pegasus.edu


HOWEVER, the job output ALL COMES from node n178:

Warning: No display specified.  You will not be able to display graphics on the screen.
Warning: No window system found.  Java option 'MWT' ignored.

                            < M A T L A B (R) >
                  Copyright 1984-2013 The MathWorks, Inc.
                    R2013a (8.1.0.604) 64-bit (glnxa64)
                             February 15, 2013

No window system found.  Java option 'MWT' ignored.
 
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
 
>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Starting matlabpool using the 'local' profile ... Warning: Found 8 pre-existing
 communicating job(s) created by matlabpool that
are running. You can use 'matlabpool close force local' to remove all jobs
created by matlabpool. 
> In InteractiveClient>InteractiveClient.pRemoveOldJobs at 426
  In InteractiveClient>InteractiveClient.start at 260
  In MatlabpoolHelper>MatlabpoolHelper.doOpen at 363
  In MatlabpoolHelper>MatlabpoolHelper.doMatlabpool at 137
  In matlabpool at 139 
connected to 4 workers.
>> >> Iteration 68 at 06-Aug-2014 13:19:37 on n178


Iteration 101 at 06-Aug-2014 13:19:37 on n178


Iteration 34 at 06-Aug-2014 13:19:37 on n178


Iteration 126 at 06-Aug-2014 13:19:37 on n178


Iteration 67 at 06-Aug-2014 13:19:43 on n178


Iteration 100 at 06-Aug-2014 13:19:43 on n178


Iteration 33 at 06-Aug-2014 13:19:43 on n178


Iteration 125 at 06-Aug-2014 13:19:43 on n178


Iteration 66 at 06-Aug-2014 13:19:48 on n178


Iteration 99 at 06-Aug-2014 13:19:48 on n178


Iteration 32 at 06-Aug-2014 13:19:48 on n178


Iteration 124 at 06-Aug-2014 13:19:48 on n178


Iteration 65 at 06-Aug-2014 13:19:53 on n178


Iteration 98 at 06-Aug-2014 13:19:53 on n178


Iteration 31 at 06-Aug-2014 13:19:53 on n178


Iteration 123 at 06-Aug-2014 13:19:53 on n178


Iteration 64 at 06-Aug-2014 13:19:58 on n178


Iteration 97 at 06-Aug-2014 13:19:58 on n178


Iteration 30 at 06-Aug-2014 13:19:58 on n178


Iteration 122 at 06-Aug-2014 13:19:58 on n178


Iteration 63 at 06-Aug-2014 13:20:03 on n178


Iteration 96 at 06-Aug-2014 13:20:03 on n178


Iteration 29 at 06-Aug-2014 13:20:03 on n178


Iteration 121 at 06-Aug-2014 13:20:03 on n178


Iteration 62 at 06-Aug-2014 13:20:08 on n178


Iteration 95 at 06-Aug-2014 13:20:08 on n178


Iteration 28 at 06-Aug-2014 13:20:08 on n178


Iteration 120 at 06-Aug-2014 13:20:08 on n178


Iteration 61 at 06-Aug-2014 13:20:13 on n178


Iteration 94 at 06-Aug-2014 13:20:13 on n178


Iteration 27 at 06-Aug-2014 13:20:13 on n178


Iteration 119 at 06-Aug-2014 13:20:13 on n178


Iteration 60 at 06-Aug-2014 13:20:18 on n178


Iteration 93 at 06-Aug-2014 13:20:18 on n178


Iteration 26 at 06-Aug-2014 13:20:18 on n178


Iteration 118 at 06-Aug-2014 13:20:18 on n178


Iteration 59 at 06-Aug-2014 13:20:23 on n178


Iteration 92 at 06-Aug-2014 13:20:23 on n178


Iteration 25 at 06-Aug-2014 13:20:23 on n178


Iteration 117 at 06-Aug-2014 13:20:23 on n178


Iteration 58 at 06-Aug-2014 13:20:29 on n178


Iteration 91 at 06-Aug-2014 13:20:29 on n178


Iteration 24 at 06-Aug-2014 13:20:29 on n178


Iteration 116 at 06-Aug-2014 13:20:28 on n178


Iteration 57 at 06-Aug-2014 13:20:34 on n178


Iteration 90 at 06-Aug-2014 13:20:34 on n178


Iteration 23 at 06-Aug-2014 13:20:34 on n178


Iteration 115 at 06-Aug-2014 13:20:34 on n178


Iteration 56 at 06-Aug-2014 13:20:39 on n178


Iteration 89 at 06-Aug-2014 13:20:39 on n178


Iteration 22 at 06-Aug-2014 13:20:39 on n178


Iteration 114 at 06-Aug-2014 13:20:39 on n178


Iteration 55 at 06-Aug-2014 13:20:44 on n178


Iteration 88 at 06-Aug-2014 13:20:44 on n178


Iteration 21 at 06-Aug-2014 13:20:44 on n178


Iteration 113 at 06-Aug-2014 13:20:44 on n178


Iteration 54 at 06-Aug-2014 13:20:49 on n178


Iteration 87 at 06-Aug-2014 13:20:49 on n178


Iteration 20 at 06-Aug-2014 13:20:49 on n178


Iteration 112 at 06-Aug-2014 13:20:49 on n178


Iteration 53 at 06-Aug-2014 13:20:54 on n178


Iteration 86 at 06-Aug-2014 13:20:54 on n178


Iteration 19 at 06-Aug-2014 13:20:54 on n178


Iteration 111 at 06-Aug-2014 13:20:54 on n178


Iteration 85 at 06-Aug-2014 13:20:59 on n178


Iteration 110 at 06-Aug-2014 13:20:59 on n178


Iteration 52 at 06-Aug-2014 13:20:59 on n178


Iteration 18 at 06-Aug-2014 13:20:59 on n178


Iteration 109 at 06-Aug-2014 13:21:04 on n178


Iteration 51 at 06-Aug-2014 13:21:04 on n178


Iteration 84 at 06-Aug-2014 13:21:04 on n178


Iteration 17 at 06-Aug-2014 13:21:04 on n178


Iteration 50 at 06-Aug-2014 13:21:09 on n178


Iteration 83 at 06-Aug-2014 13:21:09 on n178


Iteration 16 at 06-Aug-2014 13:21:09 on n178


Iteration 108 at 06-Aug-2014 13:21:09 on n178


Iteration 49 at 06-Aug-2014 13:21:14 on n178


Iteration 82 at 06-Aug-2014 13:21:14 on n178


Iteration 15 at 06-Aug-2014 13:21:14 on n178


Iteration 107 at 06-Aug-2014 13:21:14 on n178


Iteration 48 at 06-Aug-2014 13:21:20 on n178


Iteration 81 at 06-Aug-2014 13:21:20 on n178


Iteration 14 at 06-Aug-2014 13:21:20 on n178


Iteration 106 at 06-Aug-2014 13:21:19 on n178


Iteration 47 at 06-Aug-2014 13:21:25 on n178


Iteration 80 at 06-Aug-2014 13:21:25 on n178


Iteration 13 at 06-Aug-2014 13:21:25 on n178


Iteration 105 at 06-Aug-2014 13:21:24 on n178


Iteration 46 at 06-Aug-2014 13:21:30 on n178


Iteration 79 at 06-Aug-2014 13:21:30 on n178


Iteration 12 at 06-Aug-2014 13:21:30 on n178


Iteration 104 at 06-Aug-2014 13:21:30 on n178


Iteration 45 at 06-Aug-2014 13:21:35 on n178


Iteration 78 at 06-Aug-2014 13:21:35 on n178


Iteration 11 at 06-Aug-2014 13:21:35 on n178


Iteration 103 at 06-Aug-2014 13:21:35 on n178


Iteration 44 at 06-Aug-2014 13:21:40 on n178


Iteration 77 at 06-Aug-2014 13:21:40 on n178


Iteration 10 at 06-Aug-2014 13:21:40 on n178


Iteration 102 at 06-Aug-2014 13:21:40 on n178


Iteration 43 at 06-Aug-2014 13:21:45 on n178


Iteration 76 at 06-Aug-2014 13:21:45 on n178


Iteration 9 at 06-Aug-2014 13:21:45 on n178


Iteration 145 at 06-Aug-2014 13:21:45 on n178


Iteration 42 at 06-Aug-2014 13:21:50 on n178


Iteration 75 at 06-Aug-2014 13:21:50 on n178


Iteration 8 at 06-Aug-2014 13:21:50 on n178


Iteration 144 at 06-Aug-2014 13:21:50 on n178


Iteration 41 at 06-Aug-2014 13:21:55 on n178


Iteration 74 at 06-Aug-2014 13:21:55 on n178


Iteration 7 at 06-Aug-2014 13:21:55 on n178


Iteration 143 at 06-Aug-2014 13:21:55 on n178


Iteration 40 at 06-Aug-2014 13:22:00 on n178


Iteration 73 at 06-Aug-2014 13:22:00 on n178


Iteration 6 at 06-Aug-2014 13:22:00 on n178


Iteration 142 at 06-Aug-2014 13:22:00 on n178


Iteration 39 at 06-Aug-2014 13:22:06 on n178


Iteration 72 at 06-Aug-2014 13:22:05 on n178


Iteration 5 at 06-Aug-2014 13:22:05 on n178


Iteration 141 at 06-Aug-2014 13:22:05 on n178


Iteration 38 at 06-Aug-2014 13:22:11 on n178


Iteration 71 at 06-Aug-2014 13:22:11 on n178


Iteration 4 at 06-Aug-2014 13:22:11 on n178


Iteration 140 at 06-Aug-2014 13:22:10 on n178


Iteration 37 at 06-Aug-2014 13:22:16 on n178


Iteration 70 at 06-Aug-2014 13:22:16 on n178


Iteration 3 at 06-Aug-2014 13:22:16 on n178


Iteration 139 at 06-Aug-2014 13:22:16 on n178


Iteration 36 at 06-Aug-2014 13:22:21 on n178


Iteration 69 at 06-Aug-2014 13:22:21 on n178


Iteration 2 at 06-Aug-2014 13:22:21 on n178


Iteration 138 at 06-Aug-2014 13:22:21 on n178


Iteration 137 at 06-Aug-2014 13:22:26 on n178


Iteration 35 at 06-Aug-2014 13:22:26 on n178


Iteration 159 at 06-Aug-2014 13:22:26 on n178


Iteration 1 at 06-Aug-2014 13:22:26 on n178


Iteration 158 at 06-Aug-2014 13:22:31 on n178


Iteration 136 at 06-Aug-2014 13:22:31 on n178


Iteration 178 at 06-Aug-2014 13:22:31 on n178


Iteration 170 at 06-Aug-2014 13:22:31 on n178


Iteration 135 at 06-Aug-2014 13:22:36 on n178


Iteration 177 at 06-Aug-2014 13:22:36 on n178


Iteration 157 at 06-Aug-2014 13:22:36 on n178


Iteration 169 at 06-Aug-2014 13:22:36 on n178


Iteration 176 at 06-Aug-2014 13:22:41 on n178


Iteration 156 at 06-Aug-2014 13:22:41 on n178


Iteration 168 at 06-Aug-2014 13:22:41 on n178


Iteration 134 at 06-Aug-2014 13:22:41 on n178


Iteration 175 at 06-Aug-2014 13:22:46 on n178


Iteration 155 at 06-Aug-2014 13:22:46 on n178


Iteration 167 at 06-Aug-2014 13:22:46 on n178


Iteration 133 at 06-Aug-2014 13:22:46 on n178


Iteration 174 at 06-Aug-2014 13:22:51 on n178


Iteration 154 at 06-Aug-2014 13:22:51 on n178


Iteration 166 at 06-Aug-2014 13:22:51 on n178


Iteration 132 at 06-Aug-2014 13:22:51 on n178


Iteration 173 at 06-Aug-2014 13:22:57 on n178


Iteration 153 at 06-Aug-2014 13:22:56 on n178


Iteration 165 at 06-Aug-2014 13:22:56 on n178


Iteration 131 at 06-Aug-2014 13:22:56 on n178


Iteration 172 at 06-Aug-2014 13:23:02 on n178


Iteration 152 at 06-Aug-2014 13:23:01 on n178


Iteration 164 at 06-Aug-2014 13:23:02 on n178


Iteration 130 at 06-Aug-2014 13:23:01 on n178


Iteration 171 at 06-Aug-2014 13:23:07 on n178


Iteration 151 at 06-Aug-2014 13:23:07 on n178


Iteration 163 at 06-Aug-2014 13:23:07 on n178


Iteration 129 at 06-Aug-2014 13:23:06 on n178


Iteration 150 at 06-Aug-2014 13:23:12 on n178


Iteration 128 at 06-Aug-2014 13:23:12 on n178


Iteration 184 at 06-Aug-2014 13:23:12 on n178


Iteration 162 at 06-Aug-2014 13:23:12 on n178


Iteration 149 at 06-Aug-2014 13:23:17 on n178


Iteration 127 at 06-Aug-2014 13:23:17 on n178


Iteration 183 at 06-Aug-2014 13:23:17 on n178


Iteration 161 at 06-Aug-2014 13:23:17 on n178


Iteration 182 at 06-Aug-2014 13:23:22 on n178


Iteration 148 at 06-Aug-2014 13:23:22 on n178


Iteration 160 at 06-Aug-2014 13:23:22 on n178


Iteration 189 at 06-Aug-2014 13:23:22 on n178


Iteration 147 at 06-Aug-2014 13:23:27 on n178


Iteration 188 at 06-Aug-2014 13:23:27 on n178


Iteration 181 at 06-Aug-2014 13:23:27 on n178


Iteration 194 at 06-Aug-2014 13:23:27 on n178


Iteration 187 at 06-Aug-2014 13:23:32 on n178


Iteration 180 at 06-Aug-2014 13:23:32 on n178


Iteration 146 at 06-Aug-2014 13:23:32 on n178


Iteration 193 at 06-Aug-2014 13:23:32 on n178


Iteration 186 at 06-Aug-2014 13:23:37 on n178


Iteration 179 at 06-Aug-2014 13:23:37 on n178


Iteration 199 at 06-Aug-2014 13:23:37 on n178


Iteration 192 at 06-Aug-2014 13:23:37 on n178


Iteration 198 at 06-Aug-2014 13:23:42 on n178


Iteration 185 at 06-Aug-2014 13:23:42 on n178


Iteration 200 at 06-Aug-2014 13:23:42 on n178


Iteration 191 at 06-Aug-2014 13:23:42 on n178


Iteration 197 at 06-Aug-2014 13:23:47 on n178


Iteration 190 at 06-Aug-2014 13:23:47 on n178


Iteration 196 at 06-Aug-2014 13:23:52 on n178


Iteration 195 at 06-Aug-2014 13:23:58 on n178


>> >> Sending a stop signal to all the workers ... stopped.
>> >> 


So what MATLAB is doing is starting all 4 workers on n178, even though LSF assigned the job to run on n178, n096, n025, n210! This is bad because if something else is running on n178 expecting your job to take only 1 cpu, but in reality it is actually taking more than that, then performance will suffer, at least. Even worse, if your job takes a lot of memory but you haven't reserved enough because you were expecting each worker to be on a different node then this mistake could cause swapping or even crashing the compute node.

The lesson learned here is to make sure your DCT jobs request ALL cpus on the same node, AND make sure you will have enough memory to run with however many workers you are requesting.