- Fri Jul 18, 2014 4:16 pm
#382042
I had to stop a render task with 400 frames. Upon resubmitting the job, my render nodes all said the same thing:
"[18/July/2014 09:17:07] New job order received
[18/July/2014 09:17:07] ##### Current frame changed. Frame: 124 #####
[18/July/2014 09:17:07] ANIMATION: fileFinalPath: GFPS-200_Rev02_0124.mxs
[18/July/2014 09:17:07] Entering in allDependenciesReceived..
[18/July/2014 09:17:07] This node will not receive dependencies
[18/July/2014 09:17:07] Scene:Y:\CS Products\EJC\All_Products-Max2013\export\Floors\GFPS-200\Animations\GFPS-200_Rev02_0124.mxs not found yet. Waiting for more dependencies..."
In the monitor, I don't have the "Send Dependencies" box checked, so each node should just go get the network resources. I also verified that the resources exist and are accessible to all nodes. I even RDP'd into each node and manually browsed to the assets. They are on mapped network drives (drive "Y:\" as shown in the pasted log messages) as opposed to direct UNC paths (\\<Server>\<Network Share>\CS Products\EJC\All_Products-Max2013\export\Floors\GFPS-200\Animations\GFPS-200_Rev02_0124.mxs). I restarted all the render nodes (mxnetwork.exe, not the actual machines themselves) and even restarted the running Manager.
None of the above seemed to work. I had to physically reboot all the machines in my farm before it returned to normal.
There was one non-Maxwell issue that happened in the midst of all of this though that may be the culprit. Our IT team is migrating 20-30 users to a different part of the building. To do so, they are doing some major work in the IDF where my main ethernet switch is. They "accidentally" unplugged the switch for a second but quickly plugged it back in. Fortunately, this happened between any major read/write requests by the nodes, so the render job never crashed. Surprisingly, the monitor didn't even crash or fail to report the status of each node.
The switch going down threw some kind of flag within Windows networking, but it appeared that all normal traffic was restored. When the switch rebooted, each node flagged the network drives with a big red "X", but once the switch was back, any basic request to those mapped drives was successful. I assumed that by reloading all the Maxwell network utilities, they would run as normal. That just simply wasn't the case. Does mxnetwork.exe store a non-persistent (dies upon reboot) cache of available file/system resources? If so, it would be nice if that cache was completely flushed every time mxnetwork.exe was launched. The ethernet switch might not even have anything to do with it either, I honestly don't know.
Thanks Maxwell Team! Happy Friday!
"[18/July/2014 09:17:07] New job order received
[18/July/2014 09:17:07] ##### Current frame changed. Frame: 124 #####
[18/July/2014 09:17:07] ANIMATION: fileFinalPath: GFPS-200_Rev02_0124.mxs
[18/July/2014 09:17:07] Entering in allDependenciesReceived..
[18/July/2014 09:17:07] This node will not receive dependencies
[18/July/2014 09:17:07] Scene:Y:\CS Products\EJC\All_Products-Max2013\export\Floors\GFPS-200\Animations\GFPS-200_Rev02_0124.mxs not found yet. Waiting for more dependencies..."
In the monitor, I don't have the "Send Dependencies" box checked, so each node should just go get the network resources. I also verified that the resources exist and are accessible to all nodes. I even RDP'd into each node and manually browsed to the assets. They are on mapped network drives (drive "Y:\" as shown in the pasted log messages) as opposed to direct UNC paths (\\<Server>\<Network Share>\CS Products\EJC\All_Products-Max2013\export\Floors\GFPS-200\Animations\GFPS-200_Rev02_0124.mxs). I restarted all the render nodes (mxnetwork.exe, not the actual machines themselves) and even restarted the running Manager.
None of the above seemed to work. I had to physically reboot all the machines in my farm before it returned to normal.
There was one non-Maxwell issue that happened in the midst of all of this though that may be the culprit. Our IT team is migrating 20-30 users to a different part of the building. To do so, they are doing some major work in the IDF where my main ethernet switch is. They "accidentally" unplugged the switch for a second but quickly plugged it back in. Fortunately, this happened between any major read/write requests by the nodes, so the render job never crashed. Surprisingly, the monitor didn't even crash or fail to report the status of each node.
The switch going down threw some kind of flag within Windows networking, but it appeared that all normal traffic was restored. When the switch rebooted, each node flagged the network drives with a big red "X", but once the switch was back, any basic request to those mapped drives was successful. I assumed that by reloading all the Maxwell network utilities, they would run as normal. That just simply wasn't the case. Does mxnetwork.exe store a non-persistent (dies upon reboot) cache of available file/system resources? If so, it would be nice if that cache was completely flushed every time mxnetwork.exe was launched. The ethernet switch might not even have anything to do with it either, I honestly don't know.
Thanks Maxwell Team! Happy Friday!
Regards,
Zack Parrish
-
Maxwell - 4.2.0.3
Maxwell 4 | 3ds Max - 4.2.4
336 capable Maxwell threads!
-
Workstation:
Dual E5-2680v3, 64GB, Quadro K5200
48 threads (HT) @ 139.2GHz
-
Render Farm:
288 threads (HT) @ 835.2GHz
Zack Parrish
-
Maxwell - 4.2.0.3
Maxwell 4 | 3ds Max - 4.2.4
336 capable Maxwell threads!
-
Workstation:
Dual E5-2680v3, 64GB, Quadro K5200
48 threads (HT) @ 139.2GHz
-
Render Farm:
288 threads (HT) @ 835.2GHz