Clarification on tp_network components and parameters

Clarification on tp_network components and parameters#385488

By zparrish - Thu Feb 19, 2015 12:34 am

- Thu Feb 19, 2015 12:34 am #385488

I thought it might be helpful to outline the overall functionality of the tp_network components and their parameters beyond the standard use case scenarios mentioned in the public docs. For official documentation, refer to the public docs at:

http://support.nextlimit.com/display/mx ... gy+Preview

Please note that the below info IS NOT OFFICIAL DOCUMENTATION FROM NEXT LIMIT and is merely my initial interpretation of how this all works and some questions where I just have no idea what I'm looking at. What I'm really trying to accomplish in this topic is to hopefully have Next Limit expand, critique, and correct the below info. Once refined, this info could be aggregated and condensed into 1 or more flow charts and may even help expand the public docs.

-------------------------------------------------------------

Basic Components of tp_network.exe (similar to old network tools):
Manager:

Description - Handles the schedule of render tasks by delegating them to the nodes and reports status to monitors.
tp_network.exe variant - tp_network.exe -manager

Node:

Description - Server application that receives requests from the manager and does the actual render computation by launching Maxwell.
tp_network.exe variant - tp_network.exe -node

Monitor:

Description - User interface for scheduling tasks and also displays status information that it receives from the manager. As far as I know, the monitor never really communicates with any of the actual nodes. It only talks to the manager.
tp_network.exe variant - tp_network.exe -monitor

New Components / Sub Components:
Since the interconnect between "tp_network.exe" on one host to another no longer limits itself to basic TCP/IP networking with plain text transactions and uses more common models like web servers, protocols (HTTP), and even other languages (Javascript & HTML), there are now apparently sub-components to the variants of tp_network.exe. To get a glimpse of some of these and to see additional parameters, we look at the help output of tp_network.exe:

Code: Select all

C:\Program Files\Next Limit\Maxwell 3\tp_network>tp_network.exe -h
usage: mxcnetwork [-h] [-M MANAGER] [-v VERBOSITY] [-l LISTEN] [-j JOBMAN]
                  [-g LOGHUB] [-a JOBFILE] [-r] [-n] [-s SHARED] [-t TREE]
                  [-T TEMPDIR] [-p PARAMS] [-P] [-H] [-w] [-np]
                  [--address-webserver ADDRESS_WEBSERVER]
                  [--port-jobman-listen PORT_JOBMAN_LISTEN]
                  [--port-jobman-publish PORT_JOBMAN_PUBLISH]
                  [--port-params-listen PORT_PARAMS_LISTEN]
                  [--port-webserver PORT_WEBSERVER] [--port-tree PORT_TREE]
                  [--port-log-push PORT_LOG_PUSH]
                  [--port-log-web PORT_LOG_WEB]
                  mode

positional arguments:
  mode                  Start jobman|render|web|client|params|tree|weblog|buff
                        ertofile|localshow 'version'or dump 'state'

optional arguments:
  -h, --help            show this help message and exit
  -M MANAGER, --manager MANAGER
                        Address to listen on when manager mode is used, or to
                        connect to when render mode is used. If not specified,
                        auto-discovery is tried
  -v VERBOSITY, --verbosity VERBOSITY
                        Verbosity (7 -> debug, 6 -> info, 5-> message,
                        4->warning)
  -l LISTEN, --listen LISTEN
                        Address to listen on (server)/Address to listen to
  -j JOBMAN, --jobman JOBMAN
                        Address of the jobmanager/dispatcherto connect to. If
                        not specified, auto-discovery is tried
  -g LOGHUB, --loghub LOGHUB
                        Address of the log concentrator. If not present, log
                        only locally
  -a JOBFILE, --add-job JOBFILE
                        Add job. Input specified as a file in protobuftext fmt
  -r, --request-status  Print current jobman status
  -n, --dont-publish-address
                        Do not broadcast the job manager address in the
                        network (you will have to enter -j <ip> on render
                        nodes
  -s SHARED, --shared-dir SHARED
                        Shared directory mount point
  -t TREE, --tree TREE  Address of the tree/params server (default, web server
                        IPaddress)
  -T TEMPDIR, --tempdir TEMPDIR
                        Temporary dir(default
                        c:\users\zparri~1.c-s\appdata\local\temp )
  -p PARAMS, --params PARAMS
                        ip,mxs,output
  -P, --popen           Use popen instead of subprocesses (not recommended)
  -H, --threads         Use threads instead of subprocesses (exp)
  -w, --gui             Run in gui (windowed) mode.
  -np, --nopreview      Do not send previews when in render mode
  --address-webserver ADDRESS_WEBSERVER
                        Webserver IP address for js client (if different from
                        detectable IP address). This is for js only, web
                        server will still listen on all interfaces
  --port-jobman-listen PORT_JOBMAN_LISTEN
                        Port for the REP jobman socket
  --port-jobman-publish PORT_JOBMAN_PUBLISH
                        Port for the PUB jobman socket
  --port-params-listen PORT_PARAMS_LISTEN
                        Port for the REP params socket
  --port-webserver PORT_WEBSERVER
                        Port for the webserver to listen on (HTTP+Websocket)
  --port-tree PORT_TREE
                        Port for the tree service to listen on (HTTP)
  --port-log-push PORT_LOG_PUSH
                        Port for the tree service to listen on (HTTP)
  --port-log-web PORT_LOG_WEB
                        Port for the tree service to listen on (HTTP)

---------------------------------------------------------------------------------

Questions and Comments

---------------------------------------------------------------------------------

LOGHUB

Code: Select all

-g LOGHUB, --loghub LOGHUB
                        Address of the log concentrator. If not present, log
                        only locally

I know what log concatenation is, but I didn't realize for Maxwell it would be a separate, detachable component. How does this feature work and what is it used for? Does it somehow tie into the "File-> Save Logs" feature in the monitor?

---------------------------------------------------------------------------------

TREE

Code: Select all

-t TREE, --tree TREE  Address of the tree/params server (default, web server
                        IPaddress)

I'm really not sure what this component is and how to deliberately use it.

---------------------------------------------------------------------------------

WEB SERVER

Code: Select all

--address-webserver ADDRESS_WEBSERVER
                        Webserver IP address for js client (if different from
                        detectable IP address). This is for js only, web
                        server will still listen on all interfaces

This appears to be part of the manager. It's a miniature web server that temporarily listens on a custom port (other than port 80) and serves up HTTP requests that are really just simplified HTML wrappers for rather dynamic, compressed Javascript files. I did take a look at the Javascript that gets returned to the browser. Did you guys write all that completely from scratch or were you working from an existing platform like Node.js or Tornado?

---------------------------------------------------------------------------------

Somewhat vague parameters

Code: Select all

  -P, --popen           Use popen instead of subprocesses (not recommended)
  -H, --threads         Use threads instead of subprocesses (exp)

Are these parameters specific to the Maxwell Render engine or are they parameters for the Web Server? The reason I ask is that one of the selling points for Node.js is that it runs under a single thread. I thought that maybe these paramaters were for the Web Server to accommodate render farm layouts were you had several thousand nodes and the default Web Server modes couldn't handle all of the status and update requests.

---------------------------------------------------------------------------------

REP jobman socket?

Code: Select all

--port-jobman-listen PORT_JOBMAN_LISTEN
                        Port for the REP jobman socket

My gut feeling is that there's a typo in the description. Maybe not, but I really don't know what "REP" stands for. I'm assuming this setting is actually the port that the manager uses to listen to the status responses of nodes.

---------------------------------------------------------------------------------

DUPLICATE DESCRIPTIONS

Code: Select all

--port-tree PORT_TREE
                        Port for the tree service to listen on (HTTP)
  --port-log-push PORT_LOG_PUSH
                        Port for the tree service to listen on (HTTP)
  --port-log-web PORT_LOG_WEB
                        Port for the tree service to listen on (HTTP)

I grouped these lines together because their descriptions are all identical. I get the feeling that they were copy and pasted into the help text but actually need updated and are currently typos.

---------------------------------------------------------------------------------

This all seems like a good place to start. I know I'll have more questions once the above is clarified. Thanks Next Limit!

Regards,
Zack Parrish
-
Maxwell - 4.2.0.3
Maxwell 4 | 3ds Max - 4.2.4
336 capable Maxwell threads!
-
Workstation:
Dual E5-2680v3, 64GB, Quadro K5200
48 threads (HT) @ 139.2GHz
-
Render Farm:
288 threads (HT) @ 835.2GHz

Re: Clarification on tp_network components and parameters#385489

By pablo - Thu Feb 19, 2015 4:22 pm

- Thu Feb 19, 2015 4:22 pm #385489

Well, this is some heck of a forensics analysis! xD

I'll try some time in the future to describe some internals about the tp_network arquitecture, for the moment I can give you a brief introduction.

Actually, there are six different processes in the manager, they are just launched with -monitor together but they are designed so they could run in different machines if needed. Most of the command line options that you patiently described make only sense in this unusual (future?) setups where this split is done. In the usual case of use of the manager they are just internal wiring. The six processes on the manager are:

- The job manager itself. Controlling jobs, rendernodes and their statuses.
- The web server. Based on Tornado (a library to make your own web servers,http://www.tornadoweb.org/ ). From the point of view of the job manager, the web server is actually a client (this is, it is using the client library for the job manager)
- The web client (in the web browser). Making heavy use of Websockets, jquery, bootstrap and other nifty frontend technologies
- The parameters retrieval service. This is used to extract the parameters from the mxs. This is the component that needs to access the shared folder, making it possible to run the manager/web server in a machine that has no actual access to the share.
- The tree service. This is a somehow more conventional web server (using the wsgiref.simple_server included with the standard python library) used to serve the filetree to browse the shared folder and select the mxs to render.
- The weblog concentrator. This is another Tornado/Websocket listener that is used to gather the logs from the rendernodes and the job manager. The web console connects to this service to show the log window.

The rendernode has two processes, one controlling the output of maxwell and the other listening and sending information to the job manager.

The communication between the job manager and the clients (rendernodes, web server, params) is made using 0mq (see http://www.zeromq.org) sockets and serialized using Google's Protocol Buffers (https://developers.google.com/protocol-buffers/). The 'REP' socket you were wondering about relates to the REQ/REP (request/reply) pattern used in the synchronous communication with the job manager. It is a ZMQ idiom. There is a port-jobman-publish also wich could be called a PUB socket in zmq terminology (from the Public/Subscribe pattern), used to notify asynchronously of job manager changes to any client subscribed.

The --help output is outdated or wrong in some places as you noticed. The -P, --popen and -H, --threads are experimental switches --- if I remember well they do work on some S.O's but not on all of them. They just relate to the way the subprocesses are launched (or if they are subprocesses at all, or threads).

As told, I will explain the internals of the network system on more detail (maybe even with some drawing) in the future.

Re: Clarification on tp_network components and parameters#385494

By zparrish - Thu Feb 19, 2015 7:27 pm

- Thu Feb 19, 2015 7:27 pm #385494

Thank you so much for this clarification Pablo! I really appreciate it! This is precisely the level of detail I was looking for

What really initiated this was my first few experiences with launching the new network and some of the odd things I encountered "right off the bat". I figured that it would be easier for you guys and myself if I were able to post more informed reports about the activity I was seeing that were based on a greater understanding of what's going on "under the hood". In an effort to post as much relevant info for you guys as possible, I try to troubleshoot issues anyway that I possibly can (obviously without violating EULA's, breaking copyright and IP laws, or just generally "stepping on peoples' toes"). For example, that's included using tools like Firebug on the monitor to evaluate the source code received by my browser and even looking at JavaScript errors that get returned to the console. I did decompress the "all.js" contents at one point (using Komodo IDE and JS Beautify) to see if I could figure out exactly where an error was coming from, but the code compression is too aggressive to actually debug the code. All of the variable and function names have been reduced to their smallest possible equivalents by scope. It's kind of funny when you look at it. I think I saw something like a dozen "function e" declarations. I knew that jQuery did the same thing, I had just never looked at it uncompressed. About the best I can do for items like this is just forward the error messages to you guys.

Having these "extra pieces of the puzzle" definitely helps out a lot. Thanks again Pablo for your clarification! Given that you already plan on elaborating on these items in the future, there should be enough here to point me in the right direction of any other issues I find. If I do get stuck and have more granular questions about how all this works or specific components, I'll just add those questions to this topic.

You've also stimulated a naturally inquiring mind

Some of the items you mentioned I have yet to "dabble" in. I have a general, technical understanding of things like Tornado, but I have yet to actually use them or develop something for/in them. ZeroMQ is completely new territory for me also. It looks like I may have some quality research time ahead of me

Take care Pablo!

Page 1 of 1
3 posts

Return to “Maxwell Network”

Jump to: