- Tue Jun 15, 2010 10:34 am
#325094
Ciao,
we have a problem with network render.
We have 4 nodes, but after 1st right-completed render, the 2nd job in queue freezes.
This seems to happen with file resolution > than 3500 pixels (horizontal).
I post the log of manager and nodes.
PC:
1) name: PA025 runs manager + monitor + node and it is the machine where project is created and saved. (os: Vista x64)
2) name: PA023 runs node (os: XP x64)
3) name: RENDER02 runs node (os: Win server 2008 R2 ent. x64)
4) name: RENDER01 runs node (os: Win server 2008 R2 ent. x64)
Network paths are by "mapped network drive", so every machine links to PA025 as X:\ without problems.
Network policies and permissions are tested and ok.
Computers are recently formatted, so "no ghosts" of previous installations or dirty window register.
Computers have 16 GB of ram installed.
Manager (on PA025):
Render process finished in node: PA025 Job ID: 0
Starting MXI transference..
Render node: RENDER02 Job ID: 0 sl: 3.00001
Render process finished in node: RENDER02 Job ID: 0
Starting MXI transference..
Render node: RENDER01 Job ID: 0 sl: 3.13286
Render process finished in node: RENDER01 Job ID: 0
Starting MXI transference..
Render node: pa023 Job ID: 0 sl: 2.73297
Render process finished in node: pa023 Job ID: 0
Starting MXI transference..
Node RENDER01 has sent MXI file.
[15/June/2010 09:12:58] ##### Job finished. id: 0 #####
[15/June/2010 09:12:58] ##### Getting next pending job #####
[15/June/2010 09:12:58] ##### Processing next pending job #####
Processing cooperative job: 1
Sending job order to node: RENDER01
Sending job order to node: pa023
Sending job order to node: PA025
Sending job order to node: RENDER02
Start merging process...
Render node: RENDER01: file_dependency_sent: da cancellare1.mxs
Merging process finished successfully!
**merged MXI of "Job 0" is saved in png and it's all ok, but after this "Job 1" does not start
The strange thing is that manager sends mxs file of "Job 1" (da cancellare1.mxs) only to a node BEFORE finishing merging process
Node 1 (on PA025):
...
[INFO] Stopping render
[INFO]
[INFO] Render finished successfully
[INFO] Ending Session...
[15/June/2010 09:10:30] Render process finished successfully!
Start MXI transfer process
MXI sent to the manager successfully!
TCP message from manager received.
[15/June/2010 09:12:58] New job order received
Connection for scene file reception.
Node 2 (on PA023):
...
[15/June/2010 09:10:29] Render stopped!!
[INFO] Stopping render
Message from render process: start_writing_final_mxi
[INFO] Benchmark of 153.349. Time: 5m17s. SL of 2.73
Message from render process: end_writing_final_mxi
[INFO] Time left:
[INFO]
[INFO] Render finished successfully
[INFO] Ending Session...
[15/June/2010 09:11:19] Render process finished successfully!
Start MXI transfer process
MXI sent to the manager successfully!
TCP message from manager received.
[15/June/2010 09:12:58] New job order received
Connection for scene file reception.
Node 3 (on RENDER 01):
...
[INFO] Stopping render
Message from render process: start_writing_final_mxi
[INFO] Benchmark of 188.888. Time: 5m25s. SL of 3.13
[INFO] Time left:
[INFO]
[INFO] Render finished successfully
[INFO] Ending Session...
Message from render process: end_writing_final_mxi
[15/June/2010 09:11:14] Render process finished successfully!
Start MXI transfer process
[15/June/2010 09:11:46] Error in MXI sending socket. Code: Network operation timed out
MXI sent to the manager successfully!
TCP message from manager received.
[15/June/2010 09:12:58] New job order received
Connection for scene file reception.
TCP message from manager received.
Start scene data transfer
Scene data sent succesfully.
Node 4 (on RENDER 02):
....
[INFO] Stopping render
[INFO] Benchmark of 200.352. Time: 4m44s. SL of 3.00
[INFO] Time left: 4m11s
[INFO]
[INFO] Render finished successfully
[INFO] Ending Session...
[15/June/2010 09:10:42] Render process finished successfully!
Start MXI transfer process
[15/June/2010 09:11:14] Error in MXI sending socket. Code: Network operation timed out
MXI sent to the manager successfully!
TCP message from manager received.
[15/June/2010 09:12:58] New job order received
** in RENDER01 and 02 there's the error: "Error in MXI sending socket. Code: Network operation timed out"
But file is sent in the right way after a second "MXI sent to the manager successfully!", so it seems all ok.
So, after 1st render all nodes freeze in "New job order received" status.
In Network monitor "Job 1" is tagged as running, and nodes are rendering, but S.L. remains at "0", also after hours.
We tested that this is not scene-complexity-depending.
During weekend we tested 10 queued coop renders (scenes were a plane+sphere, plane+cube, plane+sphere, and so on...) :
6 jobs with "x" size = 2200
2 jobs with "x" size = 3500
2 jobs with "x" size = 1000
"y" size was proportional, ratio 1,33.
7 jobs finished successfully.
Render nr 8 did not start, beacause was the "second big one".
Monday morning all nodes were in "New job order received" status, but still S.L. 0.
Any hints?
thank you
ciao
Luca
we have a problem with network render.
We have 4 nodes, but after 1st right-completed render, the 2nd job in queue freezes.
This seems to happen with file resolution > than 3500 pixels (horizontal).
I post the log of manager and nodes.
PC:
1) name: PA025 runs manager + monitor + node and it is the machine where project is created and saved. (os: Vista x64)
2) name: PA023 runs node (os: XP x64)
3) name: RENDER02 runs node (os: Win server 2008 R2 ent. x64)
4) name: RENDER01 runs node (os: Win server 2008 R2 ent. x64)
Network paths are by "mapped network drive", so every machine links to PA025 as X:\ without problems.
Network policies and permissions are tested and ok.
Computers are recently formatted, so "no ghosts" of previous installations or dirty window register.
Computers have 16 GB of ram installed.
Manager (on PA025):
Render process finished in node: PA025 Job ID: 0
Starting MXI transference..
Render node: RENDER02 Job ID: 0 sl: 3.00001
Render process finished in node: RENDER02 Job ID: 0
Starting MXI transference..
Render node: RENDER01 Job ID: 0 sl: 3.13286
Render process finished in node: RENDER01 Job ID: 0
Starting MXI transference..
Render node: pa023 Job ID: 0 sl: 2.73297
Render process finished in node: pa023 Job ID: 0
Starting MXI transference..
Node RENDER01 has sent MXI file.
[15/June/2010 09:12:58] ##### Job finished. id: 0 #####
[15/June/2010 09:12:58] ##### Getting next pending job #####
[15/June/2010 09:12:58] ##### Processing next pending job #####
Processing cooperative job: 1
Sending job order to node: RENDER01
Sending job order to node: pa023
Sending job order to node: PA025
Sending job order to node: RENDER02
Start merging process...
Render node: RENDER01: file_dependency_sent: da cancellare1.mxs
Merging process finished successfully!
**merged MXI of "Job 0" is saved in png and it's all ok, but after this "Job 1" does not start
The strange thing is that manager sends mxs file of "Job 1" (da cancellare1.mxs) only to a node BEFORE finishing merging process
Node 1 (on PA025):
...
[INFO] Stopping render
[INFO]
[INFO] Render finished successfully
[INFO] Ending Session...
[15/June/2010 09:10:30] Render process finished successfully!
Start MXI transfer process
MXI sent to the manager successfully!
TCP message from manager received.
[15/June/2010 09:12:58] New job order received
Connection for scene file reception.
Node 2 (on PA023):
...
[15/June/2010 09:10:29] Render stopped!!
[INFO] Stopping render
Message from render process: start_writing_final_mxi
[INFO] Benchmark of 153.349. Time: 5m17s. SL of 2.73
Message from render process: end_writing_final_mxi
[INFO] Time left:
[INFO]
[INFO] Render finished successfully
[INFO] Ending Session...
[15/June/2010 09:11:19] Render process finished successfully!
Start MXI transfer process
MXI sent to the manager successfully!
TCP message from manager received.
[15/June/2010 09:12:58] New job order received
Connection for scene file reception.
Node 3 (on RENDER 01):
...
[INFO] Stopping render
Message from render process: start_writing_final_mxi
[INFO] Benchmark of 188.888. Time: 5m25s. SL of 3.13
[INFO] Time left:
[INFO]
[INFO] Render finished successfully
[INFO] Ending Session...
Message from render process: end_writing_final_mxi
[15/June/2010 09:11:14] Render process finished successfully!
Start MXI transfer process
[15/June/2010 09:11:46] Error in MXI sending socket. Code: Network operation timed out
MXI sent to the manager successfully!
TCP message from manager received.
[15/June/2010 09:12:58] New job order received
Connection for scene file reception.
TCP message from manager received.
Start scene data transfer
Scene data sent succesfully.
Node 4 (on RENDER 02):
....
[INFO] Stopping render
[INFO] Benchmark of 200.352. Time: 4m44s. SL of 3.00
[INFO] Time left: 4m11s
[INFO]
[INFO] Render finished successfully
[INFO] Ending Session...
[15/June/2010 09:10:42] Render process finished successfully!
Start MXI transfer process
[15/June/2010 09:11:14] Error in MXI sending socket. Code: Network operation timed out
MXI sent to the manager successfully!
TCP message from manager received.
[15/June/2010 09:12:58] New job order received
** in RENDER01 and 02 there's the error: "Error in MXI sending socket. Code: Network operation timed out"
But file is sent in the right way after a second "MXI sent to the manager successfully!", so it seems all ok.
So, after 1st render all nodes freeze in "New job order received" status.
In Network monitor "Job 1" is tagged as running, and nodes are rendering, but S.L. remains at "0", also after hours.
We tested that this is not scene-complexity-depending.
During weekend we tested 10 queued coop renders (scenes were a plane+sphere, plane+cube, plane+sphere, and so on...) :
6 jobs with "x" size = 2200
2 jobs with "x" size = 3500
2 jobs with "x" size = 1000
"y" size was proportional, ratio 1,33.
7 jobs finished successfully.
Render nr 8 did not start, beacause was the "second big one".
Monday morning all nodes were in "New job order received" status, but still S.L. 0.
Any hints?
thank you
ciao
Luca
Ghost: "Ready..."

- By Gaspare Buonsante 20200309160206