Everything related to Maxwell network rendering systems.
User avatar
By limbus
#382002
Hi there,
we are running into some bugs with the Maxwell 3 network rendering.

1. Nodes that are released from one job will not start any other job they are assigned to. This is new in Maxwell 3 and maybe it is a feature. Anyway, it would be great to choose if you want to release the node from this job or from all jobs.
2. If one render job is pending all other jobs behind in the que will be pending as well even if they have free nodes assigned to them. Workaround: assign at least one free rendernode to any render job. This is a huge PITA especially with Batch and Animation jobs because you can't assign one node to every job.

We have 15 Workstations and 12 Rendernodes all running Windows 7 with a windows server domain.

Cheers, Florian
User avatar
By limbus
#382122
Any response from the Maxwell Team would be great since especially the first bug is a real show stopper.

Some other networking bugs we encounter almost daily:

1. Rendernodes stop when finished but won't transmit the rendered MXI to the manager. No error message is given.
2. MXIs end up in the wrong job directory. Merging then will fail and the job has to be merged by hand.
3. All MXIs of a renderjob end up in the directory of another job even with the filename(s) of that job.

Network rendering in Maxwell 2 was not perfect but much more reliable.

Florian
User avatar
By dariolanza
#382123
Hello Florian,

About not getting the rendered MXI files in the given output location: This should work fine.
If the given output location is accessible for the Manager, it collects all the MXI renders and stores them there. Knowing that you don't get them in the desired place, this indicates that the Manager is getting problems to access the output location you set.

Make a test defining the output folder in the Manager computer (i.e. in its Desktop) and let me know if it manages to save everything there as expected.

In fact, try to follow the "golden rule" on network rendering, described in this page from our online Documentation resource:

http://support.nextlimit.com/display/mx ... s+and+tips

It refers to the safer place to set the input project folder and the output folder to ease the communications among computers.

Let me know how it works in that situation.

Keep me informed.

Greetings

Dario Lanza
User avatar
By limbus
#382126
Hi Dario,
dariolanza wrote:Hello Florian,

About not getting the rendered MXI files in the given output location: This should work fine.
If the given output location is accessible for the Manager, it collects all the MXI renders and stores them there.
The Manager collects the MXI renders locally, merges them and then saves them out to the given output location. This usually works here as it should but from time to time one or more rendernodes do not tranfer the MXI to the manger. No error is given, they just end the render and do nothing more.

In case the network is down, and the MXI can not be transmitted, we do get an error message and the cause of the error is clear.
dariolanza wrote:Knowing that you don't get them in the desired place, this indicates that the Manager is getting problems to access the output location you set.
The actual error happens before the Manager even tries to write the final MXI. The Manager never starts the merging process because the rendernodes do not transfer the MXI. So the Manager is waiting for them forever.

Make a test defining the output folder in the Manager computer (i.e. in its Desktop) and let me know if it manages to save everything there as expected.
dariolanza wrote: In fact, try to follow the "golden rule" on network rendering, described in this page from our online Documentation resource:

http://support.nextlimit.com/display/mx ... s+and+tips
These "Golden Rules" are not really practical in our (and I imagine many other situations). All files are stored on a Fileserver running Windows Server. We can not run the Monitor there because we can not give all artists direct access to the fileserver for obvious reasons. So everyone who submits jobs to the renderfarm runs the Monitor locally. Since we always render without "send dependencies" (much faster and thankfully the default in Maxwell 3), this was never a problem.

The Manager runs on a separate machine as well.

I just had an idea: maybe the Manager runs into the "20 simultaneous connections at any given time" limit since it stores the MXI locally before merging. Would it help to move the TMP folder of the Manager (and maybe the rendernodes as well?) to the fileserver or would we need to run the manager on a Windows Server machine?

We do have more than 20 rendernodes but we never had this error with Maxwell 2.x

Florian
User avatar
By dariolanza
#382131
Hello Florian,

I understand that in some situations following those golden rules may be impractical for the general usage of the farm, but is the only way to remove variables and test what exactly is causing your fail there.
The way you have set your network now there are many variables in the middle that are making more difficult to find which is the component that is causing the problem, so it would be harder to debug it.

About the maximum 20, yes, definitely moving your Manager to the Windows Server machine will make possible for the Manager to handle more than 20 simultaneous connections, that is what you have there. And even more important, when a job is submitted, the Monitor itself sends the files to all the nodes involved, so if the Monitor is running on a non-server computer, it can not handle more than 20, causing problems. The same happens when you order a preview on the Monitor, where all the nodes send the current render to the Monitor, getting problems if it doesn't allow more than 20 connections.

So in case of a farm with more than 20 nodes, both the Manager and Monitor should be running on a server operating system. As you have it now, you want any designer to run a Monitor in local, what can get communication problems as they won't be able to connect to al the nodes.
So our suggestion would be to try running both the Manager and Monitor on the server computer in a dedicated user account, and maybe accessing via remote desktop from the designer seat to the Monitor running on the server.
Perform a temporary test like this just clarify if the 20 connection limit is behind these issues.

Keep me informed about the evolution of this test.

Greetings

Dario Lanza
User avatar
By limbus
#382136
Hi Dario,
dariolanza wrote: And even more important, when a job is submitted, the Monitor itself sends the files to all the nodes involved, so if the Monitor is running on a non-server computer, it can not handle more than 20, causing problems. The same happens when you order a preview on the Monitor, where all the nodes send the current render to the Monitor, getting problems if it doesn't allow more than 20 connections.
As far as I know this only applies when you turn on "Send Dependencies." We never do that. If "Send Dependencies" is off, the Nodes access all files directly on the fileserver.

This is also easy to test:

1. Open the Monitor
2. Add a job
3. make sure "Send Dependencies" is off (default in Maxwell 3)
4. Start the job and close the Monitor immediately.

The rendering will start just fine.
We also never use more than 8 nodes per job so the monitor would not run into the 20 connections limit anyway.
dariolanza wrote: Keep me informed about the evolution of this test.
I will run a test with more than 20 nodes tomorrow to see if we run into the limit and if changing the location of the TMP folder helps. I won't be able to temporarily switch to a windows server just for testing.

Anyway, it would be great if the rendernodes would issue any warning when they can not write to toe Manager due to the connection limit.

And one strange fact remains: this never happend with Maxwell 2.x which we used with the exact same setup until very recently.
User avatar
By dariolanza
#382140
Hello Florian,

Well, when any problem arises on the node, it outputs as much information as we can detect, so most of the problems happening on the network use to generate an error message.
But there are many intricate network configurations and potential missing communications situations that are completely impossible to track (i.e. if the node somehow stops sending info messages). For that cases it is very hard to determine at first what is causing the error, and no message can be generated. These are usually the hardest issues to track, and need to be diagnosed individually as there are no other way to do so.

The network from v2 to v3 has not changed essentially too much, but currently we are rewriting the whole network system from scratch, so in future versions you will have a completely new network system, better, more stable and handy.

In the meantime, let me know the evolution of your tests.

Greetings

Dario Lanza
User avatar
By limbus
#382165
dariolanza wrote: In the meantime, let me know the evolution of your tests.
I tested the network with 21 nodes all rendering the same job. No problem starting, ending or merging the files.

But in the meantime the Maxwell Network Manager keeps crashing very often, leaving us with half finished renderings that need to be collected and merged by hand. If we can recover the MXI files at all.

:(

It looks like we either can move back to Maxwell 2.x or invest a large amount of money and time into a 3rd party render manager. But as of now, the Maxwell 3 network rendering is buggy and unstable.
the render does not start

I tried hiding many of the objects in the scene wh[…]

Sketchup 2024 Released

I would like to add my voice to this annual reques[…]