Everything related to Maxwell network rendering systems.
User avatar
By Asmithey
#336761
I am using two machines on my network. My network monitor says one of my nodes is has been disconnected. It is still rendering on the node though. It is just not updating in the monitor. How do I reconnect my node as not to lose my work?

I wish there were a button that said "reconnect" to solve this issue. What about reset node? It seems that may fix it. But it does not say in the manual what happens to the current rendering file. Is it trashed? or is it resumable?

Maxwell 2.5.0.0

Thanks,

Aaron
Last edited by Asmithey on Thu Feb 17, 2011 10:03 pm, edited 1 time in total.
User avatar
By fuso
#336782
Hi Asmithey,

Since I updated to the latest version (including all plugins) I have the same problem. You can check out my recent post
which sort of addresses the same issue: http://www.maxwellrender.com/forum/view ... 72&t=35971

For now I suggest you try to manually merge your temporary mxi's which you can find in your local profile. You should be
able to find your mxi's here (Windows7): C:\Users\#Your Profile Name#\AppData\Local\Temp\mxnetwork\rendernode\
Simply open a Maxwell Render core app and go file > merge mxi.

I tried resetting nodes without any success whatsoever, resetting the entire network doesn't do the job either. But if you
hit 'kill the node' or, 'remove finished jobs' then you might lose your temporary files, so wait with that until you have tried
to recover your files.

As for resuming jobs, I have successfully resumed a rather complex cooperative job shared between 8 nodes and including
heave geometry and loads of maps. But in order to make this work you need a successfully merged mxi file (no matter if
you've stopped it manually or if it reached the desired SL by itself)!

I'm not sure what the reason was that one of your nodes dropped out, have you stopped the render, had it progressed
reasonably far at the time? Well I hope you can recover your work. I know how frustrating it is... Good luck mate

J
User avatar
By Asmithey
#336786
Hi there,

Before for I fooled around trying to fix it, I did go and copy my mxi's from my temp and was able to manually merge them. So I am ok. But then I tried to get it reconnected using the reset node button. It did not work.

So I am not sure. I have had this problem with all versions of Maxwell. Things will be going fine for many hours then, drop, disconnected.

I don't know.
Last edited by Asmithey on Thu Feb 17, 2011 10:03 pm, edited 1 time in total.
By Peder
#336806
I'm also having problems using the reset buttons. The only thing that I found to work was the "kill" command. But that means I have to go around our big office ask people to give me a moment to restart the nodes on their machines. Tedious.

I have read and verified that the best way to stop a job is to edit it and set a SL value below the current one. This will stop the render. To resume it you have to remove the job and then add it again. When you do this you get the question if you want to resume or restart the job.

Also I have found that the release node command seems to work if I need to add another job and get it started.

It seems the network software is fairly difficult to use and buggy. But extremely powerful when all machines are running so it might be worth the extra hassle.

I would welcome some experience sharing on the networking features...

Some opinions:
The network monitor app is required to be running in order for renders to start if they have dependencies checked. This is clearly stated in the manual but unintuitive nonetheless. The name Monitor is misleading. I think this should be the job of the manager no?

PC and mac on mixed networks. It would be great to have some automatic network naming translation. Like the default searchpaths in the render software. So if I am on a mac and want to send a job to some pcs it could automagically translate the filepath names to something the pc can understand.

I think there should be a warning when attempting to start more than one manager on a network and perhaps even some wizard to set up this if it is really necessary so that nodes and monitors can be connected to the correct manager graphically.

It would be great to have some more contextual help text on the buttons. Kill is for instance not clear. Nor is reset. Like stated in the posts above. Any action that deletes the current work of a node should have some confirmation like "this operation will quit the node program running on machine nn and delete the current render do you want to continue?"

It would be great to put the priority of the network render app below the maxwell app so that if a render is started locally it will have priority. Trying to move to network rendering a lot of my coworkers feel that when they want to have control they quit the node and then do not restart it again. Perhaps a function of the maxwell render app could be to temporarily pause the network render while a local render is going on and then resume it again after it is finished?

I generally find the network handling powerful but very unintuitive. It is like poking at a strange machine and trying to figure out what is going on inside by trial and error.
User avatar
By dariolanza
#336830
Hello Asmithey,

I guess that those disconnection problems happen because the network connection for that node is somehow failing (a firewall, a network card fail, etc).

Actually, when a node gets somehow disconnected, as soon as the connection is established again, it is automatically detected by the Monitor, and re-included in the render queue.

Maybe the job that was going on when it got disconnected may not be submitted to the Manager when finished, but you can take it from the node's Temporary folder when finished. In any case, the node continues rendering even during the network disconnection.

Anyway, we'll check the code to see if we could cover any additional case that may not be covered now.

Greetings

Dario Lanza
User avatar
By Asmithey
#336857
What about a KVM switch?...Could switching back and forth from one computer to another put a hick-up in the connection. I use a KVM switch to go back and fourth between my two render nodes. I will do a few tests to see If I can make it disconnect.
User avatar
By fuso
#336859
I'm using KVM switches between 3 PC's(nodes). Never had a problem regarding render nodes or the general render progress.
But I ended up with frozen PC's in the past as there were problems connected with the graphics card. Surely don't want to
remember that...
User avatar
By Asmithey
#336860
Ok. I didn't a KVM switch would be an issue. I sent another render over my tiny network. At about 4 SL into the render one of the nodes disconnected from the manager. It is always the node that is not running the manager and monitor. Attached is the log. I am not sure how to trouble shoot the network to see where or if it is failing or has a hick-up of some sort.

[28/January/2011 12:32:24] [INFO] sendMessageToRenderNode: time_update 514

[28/January/2011 12:32:24] [INFO] sendMessageToRenderNode: new_sampling_level_reached 4.00003

[28/January/2011 12:32:27] Manager disconnected!

[28/January/2011 12:32:27] Searching manager...

[28/January/2011 12:32:27] Broadcasting using port: 45456
User avatar
By Asmithey
#336864
Ok. I re-pathed the drive and the got the node reconnected to the manager but the monitor still shows the node as disconnected. I closed and reopened the monitor but it still shows the node as disconnected. I would think that if the manager was now showing the reconnected node that the monitor would show reconnected as well?

It still keeps disconnecting.
User avatar
By dariolanza
#336936
Hello Asmithey,

This is not the normal behaviour.

I keep thinking that it is that particular network connection what is cutting the connection.

Is it always happening with the same computer?

Could you try the opposite setup (that computer being Manager and Monitor, and the current manager as Rendernode)?

Greetings

Dario Lanza
User avatar
By fuso
#337923
Hi all,

There's one more really annoying issue related to this topic and it has happened quite a few times now with different files.
After having sent the first of 5 cooperative jobs to a network of 9 nodes, one of them has decided to stop rendering towards
the the end of the first job. But that particular node was still in the list, its status was still 'rendering' and there was no failure
notice I could find in the manager log. The whole job was stuck in 'stopping...' mode the next morning.

Anyway, that's not the main problem as the temporary files in the monitor's temp folder seem to have progressed far enough
to reach the desired SL after merging them. The really nasty bit is that this first job hasn't finished and has therefore stopped
proceeding to the remaining 4 jobs which were still pending. Why does a job, even when failed not progress to the next one??
Shouldn't there be some sort of check or timeout in order to ensure getting the following jobs done? At least I would expect all
the nodes which have successfully finished to move on to the next job! Btw, I have sent all those jobs with 'dependencies on'
and to 'all nodes available'.

It's not the first time that I have missed an important deadline because of that and my office is now thinking about switching
to V-Ray or going back to MentalRay. I think this is a really serious issue and to be honest I can't really blame them. In my
opinion the whole network rendering is in desperate need of a massive update and bugfix. It's a shame that the usage of the
network/cooperative mode has become so essential due to those incredibly long rendertimes. As great as Maxwell looks, it
makes me (and my employer) feel sick when looking at the amount of hardware we have to throw at still renderings at the
moment! They're even 'scared' to ask me about little animations now...

Well I hope this will be fixed very soon as I'd like to stick to Maxwell and convince my employer to be patient once more. So
what should I tell them? I'm sorry to be so bitter about this again but it makes my work hell at the moment.

Jost Ewert
User avatar
By Asmithey
#337960
I will flip-flop the manager and monitor to the machine that keeps getting dropped to see what happens.

Thanks,

Aaron
the render does not start

I tried hiding many of the objects in the scene wh[…]

Sketchup 2024 Released

I would like to add my voice to this annual reques[…]