Page 1 of 1

linux render nodes crash

Posted: Mon Apr 18, 2016 4:55 pm
by limbus
Dear Maxwell Team,
I have found a nasty bug under linux where certain scenes crash when rendering them on linux nodes while they render fine on windows.

My tests show that the crash occurs in complex scenes when:
1. Object ID, Material ID and Custom Alpha Render Channels are active
2. some emitters are "Hidden From Camera"

Once the Render Channels are deactivated or the emitters are not hidden from camera, the scene renders without a crash.

I have prepared an MXS file that crashes on our linux nodes 100% of the time. Please let me know how I can share it with you along with the full render log.

System Info:
2x Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
64 GB RAM
CentOS 6.7
Maxwell_3.2.1.2

Render log excerpt:
Code: Select all
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]: Geometry:
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  - Num Objects: 5536
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  - Num Meshes: 1899
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  - Num Triangles: 40050820
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  - Num Vertexes: 20893466
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  - Num Normals: 81686755
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  - Num Materials: 92
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  Camera: Shot_Cam
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]: Render settings:
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  - Engine version : 3.2.1.2
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  - Using RS1 engine
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  - Desired rendering time : 14400
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  - Desired sampling level : 6,490000
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  - Using 48 threads
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  - Multilight type: Intensity
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  - Save  lights in separate files:No
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  - Motion blur: disabled
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  - Displacement: enabled
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  - Dispersion: disabled
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  - Illumination layers:
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  .  . direct layer: true
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  .  . indirect layer: true
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  .  . direct caustic reflection layer: true
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  .  . direct caustic refraction layer: true
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  .  . indirect caustic reflection layer: true
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:  .  . indirect caustic refraction layer: true
2016-04-18 17:07:11:  0: STDOUT: maxwell:[INFO]:
2016-04-18 17:07:11:  0: STDOUT: maxwell:[18/April/2016 17:07:11] [INFO]: Start Voxelization
2016-04-18 17:09:16:  0: STDOUT: maxwell:[18/April/2016 17:09:16] [INFO]: End Voxelization
2016-04-18 17:09:16:  0: STDOUT: maxwell:[18/April/2016 17:09:16] [INFO]: Voxelization done.
2016-04-18 17:09:17:  0: STDOUT: maxwell:[18/April/2016 17:09:17] [INFO]: Start Rendering
2016-04-18 17:09:17:  0: STDOUT: maxwell:[INFO]:
2016-04-18 17:09:17:  0: STDOUT: Signalhandler. Code: 11
2016-04-18 17:10:53:  0: INFO: Process exit code: 139
Cheers, Florian

Re: linux render nodes crash

Posted: Tue Apr 19, 2016 10:57 am
by F. Tella
Hi Florian,

Thanks a lot for the thorough report.

You can send the scene to me. You can use WeTransfer or which ever sharing method you like.

Regards.
Fernando

Re: linux render nodes crash

Posted: Fri Apr 22, 2016 12:16 pm
by Miguel
Hi Florian,

I've made some tests with some machines and linux distros here:

CentOS 5.8 kernel 2.6.18-308, 32Gb RAM, runs Ok
CentOS 5.9 kernel 2.6.18-348, 32Gb RAM, runs Ok
CentOS 5.11 kernel 2.6.18-398, 32Gb RAM, runs Ok
CentOS 6.6 kernel 2.6.32-504, 128Gb RAM, crashes just after Start render message
Fedora 20 kernel 3.12.10-300, 64 Gb RAM, crashes just before Start Voxelization message.
At the time of crashing, memory usage is around 16Gb in the Fedora machine and 26Gb in the CentOS 6.6 machine, so running out of memory doesn't seem to be the cause.

Here:
http://stackoverflow.com/questions/2146 ... dbad-alloc
points to a kernel 3.12 bug, but CentOS 6.7 has 2.6.32-573, very close to CentOS 6.6 kernel version (2.6.32-504).

I'll keep you posted.
Best regards and sorry for this annoyance.

Re: linux render nodes crash

Posted: Fri Apr 22, 2016 1:06 pm
by limbus
Hi Miguel,
thanks for the feedback.
In my experience it is not caused by running out of memory either. Our rendernodes have 64GB and those scenes take less to render.

Happy bughunting.

Florian

Re: linux render nodes crash

Posted: Thu May 26, 2016 3:33 pm
by limbus
Any news on this bug? Going back to CentOS 5.x isn't really a solution.

Re: linux render nodes crash

Posted: Mon Oct 17, 2016 9:58 am
by limbus
Is this fixed in Maxwell 4? Will there be a fix for Maxwell 3?