- Sun Aug 17, 2014 2:37 am
#382328
Just before the end of 5-hour-renders, fairly often one of the render node crashes - never in the first hour, always just a few minutes before the render is finished : (
From the verbose log:
[17/August/2014 00:55:51] Message from render process: time_update 17363
[17/August/2014 00:55:51] Message from render process: new_sampling_level_reached 16.157475
[17/August/2014 00:55:51] [17/August/2014 00:55:51] [INFO]: Message to render node: time_update 17363
[17/August/2014 00:55:51] [17/August/2014 00:55:51] [INFO]: Message to render node: new_sampling_level_reached 16.157475
[17/August/2014 00:56:00] The remote host closed the connection . Code: 1
[17/August/2014 00:56:00] ERROR: Error in rendering process. The process crashed some time after starting successfully.
[17/August/2014 00:56:00] ERROR: Render process crashed!
[17/August/2014 00:56:00] Connecting to render process: Binding to port: 45463
[17/August/2014 00:56:00] TCP message from manager received.
[17/August/2014 00:56:00] Message from manager: cpuid
[17/August/2014 00:56:00] 7501
[17/August/2014 00:56:00] TCP message from manager received.
[17/August/2014 00:56:00] New job order received
The crashed node is automatically restarting, but at SL 0, not at the last SL (the last MXI written to disk had a higher SL), so the restarted render has a lower SL compared to where the render already was just before the crash.
Questions:
1. How does one analyse the cause of the crash? There is nothing in the node's Windows Event Viewer, it's running cool and fine.
2. Is there some way one can resume the render with a higher SL, not from SL 0, with the MXI one can find in the monitor's temp folder?
3. Any other file/trick to not have the crashed node pick up automatically at SL 0, but rather have the entire job stopped so one can do a regular resume, which would at least yield the highest SL reached just before le crash?
Tack så mycket!



From the verbose log:
[17/August/2014 00:55:51] Message from render process: time_update 17363
[17/August/2014 00:55:51] Message from render process: new_sampling_level_reached 16.157475
[17/August/2014 00:55:51] [17/August/2014 00:55:51] [INFO]: Message to render node: time_update 17363
[17/August/2014 00:55:51] [17/August/2014 00:55:51] [INFO]: Message to render node: new_sampling_level_reached 16.157475
[17/August/2014 00:56:00] The remote host closed the connection . Code: 1
[17/August/2014 00:56:00] ERROR: Error in rendering process. The process crashed some time after starting successfully.
[17/August/2014 00:56:00] ERROR: Render process crashed!
[17/August/2014 00:56:00] Connecting to render process: Binding to port: 45463
[17/August/2014 00:56:00] TCP message from manager received.
[17/August/2014 00:56:00] Message from manager: cpuid
[17/August/2014 00:56:00] 7501
[17/August/2014 00:56:00] TCP message from manager received.
[17/August/2014 00:56:00] New job order received
The crashed node is automatically restarting, but at SL 0, not at the last SL (the last MXI written to disk had a higher SL), so the restarted render has a lower SL compared to where the render already was just before the crash.
Questions:
1. How does one analyse the cause of the crash? There is nothing in the node's Windows Event Viewer, it's running cool and fine.
2. Is there some way one can resume the render with a higher SL, not from SL 0, with the MXI one can find in the monitor's temp folder?
3. Any other file/trick to not have the crashed node pick up automatically at SL 0, but rather have the entire job stopped so one can do a regular resume, which would at least yield the highest SL reached just before le crash?
Tack så mycket!


