Distributed Computing Toolbox runs only on one physical machine
Earlier I showed an example of how to use the distributed computing toolbox to run a job on pegasus2 using up to 12 workers. That works fine, but one important limitation is that when using this method, all of the workers will run on same physical node whether or not that is what you want them to do. Here is the evidence that is the case:
Here is the code we'll run for our simple job. (I'll name this file dct_example_nwkr.m) Note that for the purpose of illustration I have limited the number of workers to Nworkers = 4 and will print the hostname of the machine that is running the code within the loop.
%=====================================================================
% DCT Example: Do nothing on Nworkers. Print datestamp to show
% how it's going.
%
% Art Gleason July 14, 2014
%=====================================================================
Nworkers = 4; % make sure this matches the number in dct_bsub.job
% on the line #BSUB -n
% e.g. if Nworkers = 4 then use #BSUB -n 4
% if Nworkers = 0 or 1 then use #BSUB -n 1
N = 200; % bump this up if you want a longer job with many workers
%---open Nworkers, 0=serial, 12=max number possible---
if( Nworkers > 12 )
Nworkers = 12;
elseif( Nworkers < 0 )
Nworkers = 0;
end
if( Nworkers > 0 )
matlabpool('open',Nworkers);
end
parfor(ix=1:N, Nworkers)
[~, thehostname] = system('hostname');
ixstamp = sprintf('Iteration %d at %s on %s\n', ix, datestr(now), thehostname);
disp(ixstamp);
donothing(5);
end
if( Nworkers > 0 )
matlabpool('close');
end
If we submit this job with the following bsub commands, forcing it to run all on one node (span[ptile=4]) we get the expected output.
#!/bin/bash #BSUB -J DCTexample #BSUB -o %J.out #BSUB -e %J.err #BSUB -W 00:15 #BSUB -q general #BSUB -n 4 #BSUB -R "rusage[mem=4096]" #BSUB -R "span[ptile=4]" #BSUB -M 4096 # ### Run job # cd not needed if CWD is the right one when this is submitted # In other words, cd to the dir # Note: it IS needed to module load matlab before submitting this matlab < dct_example_nwkr.m >& dct_example.log
Here is the bjobs output (note job running on n018):
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 1120934 agleaso RUN general login4.pega 4*n018.pega DCTexample Aug 6 13:09
And here is the output (dct_example.log). Note everything printed from host n018, as expected.
Warning: No display specified. You will not be able to display graphics on the screen.
Warning: No window system found. Java option 'MWT' ignored.
< M A T L A B (R) >
Copyright 1984-2013 The MathWorks, Inc.
R2013a (8.1.0.604) 64-bit (glnxa64)
February 15, 2013
No window system found. Java option 'MWT' ignored.
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Starting matlabpool using the 'local' profile ... Warning: Found 8 pre-existing
communicating job(s) created by matlabpool that
are running. You can use 'matlabpool close force local' to remove all jobs
created by matlabpool.
> In InteractiveClient>InteractiveClient.pRemoveOldJobs at 426
In InteractiveClient>InteractiveClient.start at 260
In MatlabpoolHelper>MatlabpoolHelper.doOpen at 363
In MatlabpoolHelper>MatlabpoolHelper.doMatlabpool at 137
In matlabpool at 139
connected to 4 workers.
>> >> Iteration 68 at 06-Aug-2014 13:10:09 on n018
Iteration 126 at 06-Aug-2014 13:10:09 on n018
Iteration 101 at 06-Aug-2014 13:10:09 on n018
Iteration 34 at 06-Aug-2014 13:10:09 on n018
Iteration 67 at 06-Aug-2014 13:10:14 on n018
Iteration 125 at 06-Aug-2014 13:10:14 on n018
Iteration 100 at 06-Aug-2014 13:10:14 on n018
Iteration 33 at 06-Aug-2014 13:10:14 on n018
Iteration 66 at 06-Aug-2014 13:10:19 on n018
Iteration 124 at 06-Aug-2014 13:10:19 on n018
Iteration 99 at 06-Aug-2014 13:10:19 on n018
Iteration 32 at 06-Aug-2014 13:10:19 on n018
Iteration 65 at 06-Aug-2014 13:10:25 on n018
Iteration 123 at 06-Aug-2014 13:10:25 on n018
Iteration 98 at 06-Aug-2014 13:10:25 on n018
Iteration 31 at 06-Aug-2014 13:10:25 on n018
Iteration 64 at 06-Aug-2014 13:10:30 on n018
Iteration 122 at 06-Aug-2014 13:10:30 on n018
Iteration 97 at 06-Aug-2014 13:10:30 on n018
Iteration 30 at 06-Aug-2014 13:10:30 on n018
Iteration 63 at 06-Aug-2014 13:10:35 on n018
Iteration 121 at 06-Aug-2014 13:10:35 on n018
Iteration 96 at 06-Aug-2014 13:10:35 on n018
Iteration 29 at 06-Aug-2014 13:10:35 on n018
Iteration 62 at 06-Aug-2014 13:10:40 on n018
Iteration 120 at 06-Aug-2014 13:10:40 on n018
Iteration 95 at 06-Aug-2014 13:10:40 on n018
Iteration 28 at 06-Aug-2014 13:10:40 on n018
Iteration 61 at 06-Aug-2014 13:10:45 on n018
Iteration 119 at 06-Aug-2014 13:10:45 on n018
Iteration 94 at 06-Aug-2014 13:10:45 on n018
Iteration 27 at 06-Aug-2014 13:10:45 on n018
Iteration 60 at 06-Aug-2014 13:10:50 on n018
Iteration 118 at 06-Aug-2014 13:10:50 on n018
Iteration 93 at 06-Aug-2014 13:10:50 on n018
Iteration 26 at 06-Aug-2014 13:10:50 on n018
Iteration 59 at 06-Aug-2014 13:10:55 on n018
Iteration 117 at 06-Aug-2014 13:10:55 on n018
Iteration 92 at 06-Aug-2014 13:10:55 on n018
Iteration 25 at 06-Aug-2014 13:10:55 on n018
Iteration 58 at 06-Aug-2014 13:11:00 on n018
Iteration 116 at 06-Aug-2014 13:11:00 on n018
Iteration 91 at 06-Aug-2014 13:11:00 on n018
Iteration 24 at 06-Aug-2014 13:11:00 on n018
Iteration 57 at 06-Aug-2014 13:11:05 on n018
Iteration 115 at 06-Aug-2014 13:11:05 on n018
Iteration 90 at 06-Aug-2014 13:11:05 on n018
Iteration 23 at 06-Aug-2014 13:11:05 on n018
Iteration 56 at 06-Aug-2014 13:11:10 on n018
Iteration 114 at 06-Aug-2014 13:11:11 on n018
Iteration 89 at 06-Aug-2014 13:11:10 on n018
Iteration 22 at 06-Aug-2014 13:11:11 on n018
Iteration 55 at 06-Aug-2014 13:11:16 on n018
Iteration 113 at 06-Aug-2014 13:11:16 on n018
Iteration 88 at 06-Aug-2014 13:11:16 on n018
Iteration 21 at 06-Aug-2014 13:11:16 on n018
Iteration 54 at 06-Aug-2014 13:11:21 on n018
Iteration 112 at 06-Aug-2014 13:11:21 on n018
Iteration 87 at 06-Aug-2014 13:11:21 on n018
Iteration 20 at 06-Aug-2014 13:11:21 on n018
Iteration 53 at 06-Aug-2014 13:11:26 on n018
Iteration 111 at 06-Aug-2014 13:11:26 on n018
Iteration 86 at 06-Aug-2014 13:11:26 on n018
Iteration 19 at 06-Aug-2014 13:11:26 on n018
Iteration 52 at 06-Aug-2014 13:11:31 on n018
Iteration 110 at 06-Aug-2014 13:11:31 on n018
Iteration 85 at 06-Aug-2014 13:11:31 on n018
Iteration 18 at 06-Aug-2014 13:11:31 on n018
Iteration 51 at 06-Aug-2014 13:11:36 on n018
Iteration 109 at 06-Aug-2014 13:11:36 on n018
Iteration 84 at 06-Aug-2014 13:11:36 on n018
Iteration 17 at 06-Aug-2014 13:11:36 on n018
Iteration 50 at 06-Aug-2014 13:11:41 on n018
Iteration 83 at 06-Aug-2014 13:11:41 on n018
Iteration 108 at 06-Aug-2014 13:11:41 on n018
Iteration 16 at 06-Aug-2014 13:11:41 on n018
Iteration 49 at 06-Aug-2014 13:11:46 on n018
Iteration 107 at 06-Aug-2014 13:11:46 on n018
Iteration 82 at 06-Aug-2014 13:11:46 on n018
Iteration 15 at 06-Aug-2014 13:11:46 on n018
Iteration 48 at 06-Aug-2014 13:11:51 on n018
Iteration 106 at 06-Aug-2014 13:11:51 on n018
Iteration 81 at 06-Aug-2014 13:11:51 on n018
Iteration 14 at 06-Aug-2014 13:11:51 on n018
Iteration 47 at 06-Aug-2014 13:11:56 on n018
Iteration 105 at 06-Aug-2014 13:11:56 on n018
Iteration 80 at 06-Aug-2014 13:11:56 on n018
Iteration 13 at 06-Aug-2014 13:11:56 on n018
Iteration 46 at 06-Aug-2014 13:12:02 on n018
Iteration 104 at 06-Aug-2014 13:12:02 on n018
Iteration 79 at 06-Aug-2014 13:12:01 on n018
Iteration 12 at 06-Aug-2014 13:12:02 on n018
Iteration 45 at 06-Aug-2014 13:12:07 on n018
Iteration 103 at 06-Aug-2014 13:12:07 on n018
Iteration 78 at 06-Aug-2014 13:12:07 on n018
Iteration 11 at 06-Aug-2014 13:12:07 on n018
Iteration 44 at 06-Aug-2014 13:12:12 on n018
Iteration 102 at 06-Aug-2014 13:12:12 on n018
Iteration 77 at 06-Aug-2014 13:12:12 on n018
Iteration 10 at 06-Aug-2014 13:12:12 on n018
Iteration 76 at 06-Aug-2014 13:12:17 on n018
Iteration 43 at 06-Aug-2014 13:12:17 on n018
Iteration 145 at 06-Aug-2014 13:12:17 on n018
Iteration 9 at 06-Aug-2014 13:12:17 on n018
Iteration 75 at 06-Aug-2014 13:12:22 on n018
Iteration 42 at 06-Aug-2014 13:12:22 on n018
Iteration 144 at 06-Aug-2014 13:12:22 on n018
Iteration 8 at 06-Aug-2014 13:12:22 on n018
Iteration 41 at 06-Aug-2014 13:12:27 on n018
Iteration 143 at 06-Aug-2014 13:12:27 on n018
Iteration 74 at 06-Aug-2014 13:12:27 on n018
Iteration 7 at 06-Aug-2014 13:12:27 on n018
Iteration 40 at 06-Aug-2014 13:12:32 on n018
Iteration 142 at 06-Aug-2014 13:12:32 on n018
Iteration 73 at 06-Aug-2014 13:12:32 on n018
Iteration 6 at 06-Aug-2014 13:12:32 on n018
Iteration 39 at 06-Aug-2014 13:12:37 on n018
Iteration 141 at 06-Aug-2014 13:12:37 on n018
Iteration 72 at 06-Aug-2014 13:12:37 on n018
Iteration 5 at 06-Aug-2014 13:12:37 on n018
Iteration 38 at 06-Aug-2014 13:12:42 on n018
Iteration 140 at 06-Aug-2014 13:12:42 on n018
Iteration 71 at 06-Aug-2014 13:12:42 on n018
Iteration 4 at 06-Aug-2014 13:12:42 on n018
Iteration 37 at 06-Aug-2014 13:12:47 on n018
Iteration 139 at 06-Aug-2014 13:12:48 on n018
Iteration 70 at 06-Aug-2014 13:12:47 on n018
Iteration 3 at 06-Aug-2014 13:12:48 on n018
Iteration 36 at 06-Aug-2014 13:12:53 on n018
Iteration 138 at 06-Aug-2014 13:12:53 on n018
Iteration 69 at 06-Aug-2014 13:12:52 on n018
Iteration 2 at 06-Aug-2014 13:12:53 on n018
Iteration 35 at 06-Aug-2014 13:12:58 on n018
Iteration 137 at 06-Aug-2014 13:12:58 on n018
Iteration 159 at 06-Aug-2014 13:12:57 on n018
Iteration 1 at 06-Aug-2014 13:12:58 on n018
Iteration 158 at 06-Aug-2014 13:13:03 on n018
Iteration 170 at 06-Aug-2014 13:13:03 on n018
Iteration 136 at 06-Aug-2014 13:13:03 on n018
Iteration 178 at 06-Aug-2014 13:13:03 on n018
Iteration 157 at 06-Aug-2014 13:13:08 on n018
Iteration 169 at 06-Aug-2014 13:13:08 on n018
Iteration 135 at 06-Aug-2014 13:13:08 on n018
Iteration 177 at 06-Aug-2014 13:13:08 on n018
Iteration 156 at 06-Aug-2014 13:13:13 on n018
Iteration 168 at 06-Aug-2014 13:13:13 on n018
Iteration 134 at 06-Aug-2014 13:13:13 on n018
Iteration 176 at 06-Aug-2014 13:13:13 on n018
Iteration 167 at 06-Aug-2014 13:13:18 on n018
Iteration 133 at 06-Aug-2014 13:13:18 on n018
Iteration 155 at 06-Aug-2014 13:13:18 on n018
Iteration 175 at 06-Aug-2014 13:13:18 on n018
Iteration 166 at 06-Aug-2014 13:13:23 on n018
Iteration 132 at 06-Aug-2014 13:13:23 on n018
Iteration 154 at 06-Aug-2014 13:13:23 on n018
Iteration 174 at 06-Aug-2014 13:13:23 on n018
Iteration 165 at 06-Aug-2014 13:13:28 on n018
Iteration 131 at 06-Aug-2014 13:13:28 on n018
Iteration 153 at 06-Aug-2014 13:13:28 on n018
Iteration 173 at 06-Aug-2014 13:13:28 on n018
Iteration 164 at 06-Aug-2014 13:13:33 on n018
Iteration 130 at 06-Aug-2014 13:13:33 on n018
Iteration 152 at 06-Aug-2014 13:13:33 on n018
Iteration 172 at 06-Aug-2014 13:13:33 on n018
Iteration 163 at 06-Aug-2014 13:13:39 on n018
Iteration 129 at 06-Aug-2014 13:13:39 on n018
Iteration 151 at 06-Aug-2014 13:13:38 on n018
Iteration 171 at 06-Aug-2014 13:13:39 on n018
Iteration 150 at 06-Aug-2014 13:13:43 on n018
Iteration 162 at 06-Aug-2014 13:13:44 on n018
Iteration 128 at 06-Aug-2014 13:13:44 on n018
Iteration 184 at 06-Aug-2014 13:13:44 on n018
Iteration 149 at 06-Aug-2014 13:13:48 on n018
Iteration 161 at 06-Aug-2014 13:13:49 on n018
Iteration 127 at 06-Aug-2014 13:13:49 on n018
Iteration 183 at 06-Aug-2014 13:13:49 on n018
Iteration 148 at 06-Aug-2014 13:13:54 on n018
Iteration 160 at 06-Aug-2014 13:13:54 on n018
Iteration 189 at 06-Aug-2014 13:13:54 on n018
Iteration 182 at 06-Aug-2014 13:13:54 on n018
Iteration 147 at 06-Aug-2014 13:13:59 on n018
Iteration 194 at 06-Aug-2014 13:13:59 on n018
Iteration 188 at 06-Aug-2014 13:13:59 on n018
Iteration 181 at 06-Aug-2014 13:13:59 on n018
Iteration 146 at 06-Aug-2014 13:14:04 on n018
Iteration 193 at 06-Aug-2014 13:14:04 on n018
Iteration 187 at 06-Aug-2014 13:14:04 on n018
Iteration 180 at 06-Aug-2014 13:14:04 on n018
Iteration 192 at 06-Aug-2014 13:14:09 on n018
Iteration 186 at 06-Aug-2014 13:14:09 on n018
Iteration 199 at 06-Aug-2014 13:14:09 on n018
Iteration 179 at 06-Aug-2014 13:14:09 on n018
Iteration 191 at 06-Aug-2014 13:14:14 on n018
Iteration 198 at 06-Aug-2014 13:14:14 on n018
Iteration 185 at 06-Aug-2014 13:14:14 on n018
Iteration 200 at 06-Aug-2014 13:14:14 on n018
Iteration 197 at 06-Aug-2014 13:14:19 on n018
Iteration 190 at 06-Aug-2014 13:14:19 on n018
Iteration 196 at 06-Aug-2014 13:14:24 on n018
Iteration 195 at 06-Aug-2014 13:14:29 on n018
>> >> Sending a stop signal to all the workers ... stopped.
This is all fine so far, but say you want to (or LSF decides to) assign your job to cpus on different physical nodes. Then what happens? We can force this situation by using span[ptile=1]:
#!/bin/bash #BSUB -J DCTexample #BSUB -o %J.out #BSUB -e %J.err #BSUB -W 00:15 #BSUB -q general #BSUB -n 4 #BSUB -R "rusage[mem=4096]" #BSUB -R "span[ptile=1]" #BSUB -M 4096 # ### Run job # cd not needed if CWD is the right one when this is submitted # In other words, cd to the dir # Note: it IS needed to module load matlab before submitting this matlab < dct_example_nwkr.m >& dct_example.log
The bjobs output shows 1 cpu assiged on n178, n096, n025, n210::
JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME
1120938 agleaso RUN general login4.pega 1*n178.pega DCTexample Aug 6 13:19
1*n096.pegasus.edu
1*n025.pegasus.edu
1*n210.pegasus.edu
HOWEVER, the job output ALL COMES from node n178:
Warning: No display specified. You will not be able to display graphics on the screen.
Warning: No window system found. Java option 'MWT' ignored.
< M A T L A B (R) >
Copyright 1984-2013 The MathWorks, Inc.
R2013a (8.1.0.604) 64-bit (glnxa64)
February 15, 2013
No window system found. Java option 'MWT' ignored.
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Starting matlabpool using the 'local' profile ... Warning: Found 8 pre-existing
communicating job(s) created by matlabpool that
are running. You can use 'matlabpool close force local' to remove all jobs
created by matlabpool.
> In InteractiveClient>InteractiveClient.pRemoveOldJobs at 426
In InteractiveClient>InteractiveClient.start at 260
In MatlabpoolHelper>MatlabpoolHelper.doOpen at 363
In MatlabpoolHelper>MatlabpoolHelper.doMatlabpool at 137
In matlabpool at 139
connected to 4 workers.
>> >> Iteration 68 at 06-Aug-2014 13:19:37 on n178
Iteration 101 at 06-Aug-2014 13:19:37 on n178
Iteration 34 at 06-Aug-2014 13:19:37 on n178
Iteration 126 at 06-Aug-2014 13:19:37 on n178
Iteration 67 at 06-Aug-2014 13:19:43 on n178
Iteration 100 at 06-Aug-2014 13:19:43 on n178
Iteration 33 at 06-Aug-2014 13:19:43 on n178
Iteration 125 at 06-Aug-2014 13:19:43 on n178
Iteration 66 at 06-Aug-2014 13:19:48 on n178
Iteration 99 at 06-Aug-2014 13:19:48 on n178
Iteration 32 at 06-Aug-2014 13:19:48 on n178
Iteration 124 at 06-Aug-2014 13:19:48 on n178
Iteration 65 at 06-Aug-2014 13:19:53 on n178
Iteration 98 at 06-Aug-2014 13:19:53 on n178
Iteration 31 at 06-Aug-2014 13:19:53 on n178
Iteration 123 at 06-Aug-2014 13:19:53 on n178
Iteration 64 at 06-Aug-2014 13:19:58 on n178
Iteration 97 at 06-Aug-2014 13:19:58 on n178
Iteration 30 at 06-Aug-2014 13:19:58 on n178
Iteration 122 at 06-Aug-2014 13:19:58 on n178
Iteration 63 at 06-Aug-2014 13:20:03 on n178
Iteration 96 at 06-Aug-2014 13:20:03 on n178
Iteration 29 at 06-Aug-2014 13:20:03 on n178
Iteration 121 at 06-Aug-2014 13:20:03 on n178
Iteration 62 at 06-Aug-2014 13:20:08 on n178
Iteration 95 at 06-Aug-2014 13:20:08 on n178
Iteration 28 at 06-Aug-2014 13:20:08 on n178
Iteration 120 at 06-Aug-2014 13:20:08 on n178
Iteration 61 at 06-Aug-2014 13:20:13 on n178
Iteration 94 at 06-Aug-2014 13:20:13 on n178
Iteration 27 at 06-Aug-2014 13:20:13 on n178
Iteration 119 at 06-Aug-2014 13:20:13 on n178
Iteration 60 at 06-Aug-2014 13:20:18 on n178
Iteration 93 at 06-Aug-2014 13:20:18 on n178
Iteration 26 at 06-Aug-2014 13:20:18 on n178
Iteration 118 at 06-Aug-2014 13:20:18 on n178
Iteration 59 at 06-Aug-2014 13:20:23 on n178
Iteration 92 at 06-Aug-2014 13:20:23 on n178
Iteration 25 at 06-Aug-2014 13:20:23 on n178
Iteration 117 at 06-Aug-2014 13:20:23 on n178
Iteration 58 at 06-Aug-2014 13:20:29 on n178
Iteration 91 at 06-Aug-2014 13:20:29 on n178
Iteration 24 at 06-Aug-2014 13:20:29 on n178
Iteration 116 at 06-Aug-2014 13:20:28 on n178
Iteration 57 at 06-Aug-2014 13:20:34 on n178
Iteration 90 at 06-Aug-2014 13:20:34 on n178
Iteration 23 at 06-Aug-2014 13:20:34 on n178
Iteration 115 at 06-Aug-2014 13:20:34 on n178
Iteration 56 at 06-Aug-2014 13:20:39 on n178
Iteration 89 at 06-Aug-2014 13:20:39 on n178
Iteration 22 at 06-Aug-2014 13:20:39 on n178
Iteration 114 at 06-Aug-2014 13:20:39 on n178
Iteration 55 at 06-Aug-2014 13:20:44 on n178
Iteration 88 at 06-Aug-2014 13:20:44 on n178
Iteration 21 at 06-Aug-2014 13:20:44 on n178
Iteration 113 at 06-Aug-2014 13:20:44 on n178
Iteration 54 at 06-Aug-2014 13:20:49 on n178
Iteration 87 at 06-Aug-2014 13:20:49 on n178
Iteration 20 at 06-Aug-2014 13:20:49 on n178
Iteration 112 at 06-Aug-2014 13:20:49 on n178
Iteration 53 at 06-Aug-2014 13:20:54 on n178
Iteration 86 at 06-Aug-2014 13:20:54 on n178
Iteration 19 at 06-Aug-2014 13:20:54 on n178
Iteration 111 at 06-Aug-2014 13:20:54 on n178
Iteration 85 at 06-Aug-2014 13:20:59 on n178
Iteration 110 at 06-Aug-2014 13:20:59 on n178
Iteration 52 at 06-Aug-2014 13:20:59 on n178
Iteration 18 at 06-Aug-2014 13:20:59 on n178
Iteration 109 at 06-Aug-2014 13:21:04 on n178
Iteration 51 at 06-Aug-2014 13:21:04 on n178
Iteration 84 at 06-Aug-2014 13:21:04 on n178
Iteration 17 at 06-Aug-2014 13:21:04 on n178
Iteration 50 at 06-Aug-2014 13:21:09 on n178
Iteration 83 at 06-Aug-2014 13:21:09 on n178
Iteration 16 at 06-Aug-2014 13:21:09 on n178
Iteration 108 at 06-Aug-2014 13:21:09 on n178
Iteration 49 at 06-Aug-2014 13:21:14 on n178
Iteration 82 at 06-Aug-2014 13:21:14 on n178
Iteration 15 at 06-Aug-2014 13:21:14 on n178
Iteration 107 at 06-Aug-2014 13:21:14 on n178
Iteration 48 at 06-Aug-2014 13:21:20 on n178
Iteration 81 at 06-Aug-2014 13:21:20 on n178
Iteration 14 at 06-Aug-2014 13:21:20 on n178
Iteration 106 at 06-Aug-2014 13:21:19 on n178
Iteration 47 at 06-Aug-2014 13:21:25 on n178
Iteration 80 at 06-Aug-2014 13:21:25 on n178
Iteration 13 at 06-Aug-2014 13:21:25 on n178
Iteration 105 at 06-Aug-2014 13:21:24 on n178
Iteration 46 at 06-Aug-2014 13:21:30 on n178
Iteration 79 at 06-Aug-2014 13:21:30 on n178
Iteration 12 at 06-Aug-2014 13:21:30 on n178
Iteration 104 at 06-Aug-2014 13:21:30 on n178
Iteration 45 at 06-Aug-2014 13:21:35 on n178
Iteration 78 at 06-Aug-2014 13:21:35 on n178
Iteration 11 at 06-Aug-2014 13:21:35 on n178
Iteration 103 at 06-Aug-2014 13:21:35 on n178
Iteration 44 at 06-Aug-2014 13:21:40 on n178
Iteration 77 at 06-Aug-2014 13:21:40 on n178
Iteration 10 at 06-Aug-2014 13:21:40 on n178
Iteration 102 at 06-Aug-2014 13:21:40 on n178
Iteration 43 at 06-Aug-2014 13:21:45 on n178
Iteration 76 at 06-Aug-2014 13:21:45 on n178
Iteration 9 at 06-Aug-2014 13:21:45 on n178
Iteration 145 at 06-Aug-2014 13:21:45 on n178
Iteration 42 at 06-Aug-2014 13:21:50 on n178
Iteration 75 at 06-Aug-2014 13:21:50 on n178
Iteration 8 at 06-Aug-2014 13:21:50 on n178
Iteration 144 at 06-Aug-2014 13:21:50 on n178
Iteration 41 at 06-Aug-2014 13:21:55 on n178
Iteration 74 at 06-Aug-2014 13:21:55 on n178
Iteration 7 at 06-Aug-2014 13:21:55 on n178
Iteration 143 at 06-Aug-2014 13:21:55 on n178
Iteration 40 at 06-Aug-2014 13:22:00 on n178
Iteration 73 at 06-Aug-2014 13:22:00 on n178
Iteration 6 at 06-Aug-2014 13:22:00 on n178
Iteration 142 at 06-Aug-2014 13:22:00 on n178
Iteration 39 at 06-Aug-2014 13:22:06 on n178
Iteration 72 at 06-Aug-2014 13:22:05 on n178
Iteration 5 at 06-Aug-2014 13:22:05 on n178
Iteration 141 at 06-Aug-2014 13:22:05 on n178
Iteration 38 at 06-Aug-2014 13:22:11 on n178
Iteration 71 at 06-Aug-2014 13:22:11 on n178
Iteration 4 at 06-Aug-2014 13:22:11 on n178
Iteration 140 at 06-Aug-2014 13:22:10 on n178
Iteration 37 at 06-Aug-2014 13:22:16 on n178
Iteration 70 at 06-Aug-2014 13:22:16 on n178
Iteration 3 at 06-Aug-2014 13:22:16 on n178
Iteration 139 at 06-Aug-2014 13:22:16 on n178
Iteration 36 at 06-Aug-2014 13:22:21 on n178
Iteration 69 at 06-Aug-2014 13:22:21 on n178
Iteration 2 at 06-Aug-2014 13:22:21 on n178
Iteration 138 at 06-Aug-2014 13:22:21 on n178
Iteration 137 at 06-Aug-2014 13:22:26 on n178
Iteration 35 at 06-Aug-2014 13:22:26 on n178
Iteration 159 at 06-Aug-2014 13:22:26 on n178
Iteration 1 at 06-Aug-2014 13:22:26 on n178
Iteration 158 at 06-Aug-2014 13:22:31 on n178
Iteration 136 at 06-Aug-2014 13:22:31 on n178
Iteration 178 at 06-Aug-2014 13:22:31 on n178
Iteration 170 at 06-Aug-2014 13:22:31 on n178
Iteration 135 at 06-Aug-2014 13:22:36 on n178
Iteration 177 at 06-Aug-2014 13:22:36 on n178
Iteration 157 at 06-Aug-2014 13:22:36 on n178
Iteration 169 at 06-Aug-2014 13:22:36 on n178
Iteration 176 at 06-Aug-2014 13:22:41 on n178
Iteration 156 at 06-Aug-2014 13:22:41 on n178
Iteration 168 at 06-Aug-2014 13:22:41 on n178
Iteration 134 at 06-Aug-2014 13:22:41 on n178
Iteration 175 at 06-Aug-2014 13:22:46 on n178
Iteration 155 at 06-Aug-2014 13:22:46 on n178
Iteration 167 at 06-Aug-2014 13:22:46 on n178
Iteration 133 at 06-Aug-2014 13:22:46 on n178
Iteration 174 at 06-Aug-2014 13:22:51 on n178
Iteration 154 at 06-Aug-2014 13:22:51 on n178
Iteration 166 at 06-Aug-2014 13:22:51 on n178
Iteration 132 at 06-Aug-2014 13:22:51 on n178
Iteration 173 at 06-Aug-2014 13:22:57 on n178
Iteration 153 at 06-Aug-2014 13:22:56 on n178
Iteration 165 at 06-Aug-2014 13:22:56 on n178
Iteration 131 at 06-Aug-2014 13:22:56 on n178
Iteration 172 at 06-Aug-2014 13:23:02 on n178
Iteration 152 at 06-Aug-2014 13:23:01 on n178
Iteration 164 at 06-Aug-2014 13:23:02 on n178
Iteration 130 at 06-Aug-2014 13:23:01 on n178
Iteration 171 at 06-Aug-2014 13:23:07 on n178
Iteration 151 at 06-Aug-2014 13:23:07 on n178
Iteration 163 at 06-Aug-2014 13:23:07 on n178
Iteration 129 at 06-Aug-2014 13:23:06 on n178
Iteration 150 at 06-Aug-2014 13:23:12 on n178
Iteration 128 at 06-Aug-2014 13:23:12 on n178
Iteration 184 at 06-Aug-2014 13:23:12 on n178
Iteration 162 at 06-Aug-2014 13:23:12 on n178
Iteration 149 at 06-Aug-2014 13:23:17 on n178
Iteration 127 at 06-Aug-2014 13:23:17 on n178
Iteration 183 at 06-Aug-2014 13:23:17 on n178
Iteration 161 at 06-Aug-2014 13:23:17 on n178
Iteration 182 at 06-Aug-2014 13:23:22 on n178
Iteration 148 at 06-Aug-2014 13:23:22 on n178
Iteration 160 at 06-Aug-2014 13:23:22 on n178
Iteration 189 at 06-Aug-2014 13:23:22 on n178
Iteration 147 at 06-Aug-2014 13:23:27 on n178
Iteration 188 at 06-Aug-2014 13:23:27 on n178
Iteration 181 at 06-Aug-2014 13:23:27 on n178
Iteration 194 at 06-Aug-2014 13:23:27 on n178
Iteration 187 at 06-Aug-2014 13:23:32 on n178
Iteration 180 at 06-Aug-2014 13:23:32 on n178
Iteration 146 at 06-Aug-2014 13:23:32 on n178
Iteration 193 at 06-Aug-2014 13:23:32 on n178
Iteration 186 at 06-Aug-2014 13:23:37 on n178
Iteration 179 at 06-Aug-2014 13:23:37 on n178
Iteration 199 at 06-Aug-2014 13:23:37 on n178
Iteration 192 at 06-Aug-2014 13:23:37 on n178
Iteration 198 at 06-Aug-2014 13:23:42 on n178
Iteration 185 at 06-Aug-2014 13:23:42 on n178
Iteration 200 at 06-Aug-2014 13:23:42 on n178
Iteration 191 at 06-Aug-2014 13:23:42 on n178
Iteration 197 at 06-Aug-2014 13:23:47 on n178
Iteration 190 at 06-Aug-2014 13:23:47 on n178
Iteration 196 at 06-Aug-2014 13:23:52 on n178
Iteration 195 at 06-Aug-2014 13:23:58 on n178
>> >> Sending a stop signal to all the workers ... stopped.
>> >>
So what MATLAB is doing is starting all 4 workers on n178, even though LSF assigned the job to run on n178, n096, n025, n210! This is bad because if something else is running on n178 expecting your job to take only 1 cpu, but in reality it is actually taking more than that, then performance will suffer, at least. Even worse, if your job takes a lot of memory but you haven't reserved enough because you were expecting each worker to be on a different node then this mistake could cause swapping or even crashing the compute node.
The lesson learned here is to make sure your DCT jobs request ALL cpus on the same node, AND make sure you will have enough memory to run with however many workers you are requesting.