All posts related to V2
By dmeyer
#330009
max3d wrote:
For rendering performance depending on the efficiency of the implementation etc. I would estimate the maximum performance gain of HT marketing cores at 15%.
Unless you are using Maxwell. On our farm Hyperthreading is showing a 90-96% increase in speed depending on platform and frequency.
By brodie_geers
#330038
dmeyer wrote:
max3d wrote:
For rendering performance depending on the efficiency of the implementation etc. I would estimate the maximum performance gain of HT marketing cores at 15%.
Unless you are using Maxwell. On our farm Hyperthreading is showing a 90-96% increase in speed depending on platform and frequency.
That's a pretty amazing stat, which I'd love to take to my I.T. guy. How did you test this?

-Brodie
User avatar
By max3d
#330062
dmeyer wrote:
max3d wrote:
For rendering performance depending on the efficiency of the implementation etc. I would estimate the maximum performance gain of HT marketing cores at 15%.
Unless you are using Maxwell. On our farm Hyperthreading is showing a 90-96% increase in speed depending on platform and frequency.
This would be very strange as Maxwell is a compute intensive application and normally shouldn´t gain so much by using HT. You usually find this kind of figure only in IO bound tasks where you have lots of stalls (simplified to the extreme). I assume of course that you mean the render slave efficiency, not the UI workflow as you manage a render farm.

Before I suggest reasons I would like to be sure. I´m certain you measured this but could you express this in seconds per task. Preferably (if you have the numbers available without HT, on a DP system and with HT enabled). These figures currently suggest a hyperthread effectiveness of about 1,4 or more which is quite rare for an optimized compute intensive task.

Did you test with Typeperf and if so do you still have the csv available?

If you don´t I could try and test it myself but as you know it takes quite some time so if you have the figures already available I would be very grateful.

Max.
By micheloupatrick
#330065
OK I did some testing with the Benchwell scene on my system (see signature), and here are the results for SL15 :

HT ON :
5m13s - Benchmark 2048

HT OFF :
6m54s - Benchmark 1550

So the difference is less then I thought, but significant anyway.
User avatar
By Mihnea Balta
#330067
Reusing execution units while a thread is waiting for data to arrive from memory pays off. HT didn't do much back when it was first introduced in Pentium 4 chips because the memory was fast enough, but latency has improved very little since then, while execution units have become significantly faster. That's why HT works today.

The same principle is used in lots of current architectures. GPUs do it too, but in a slightly different way (they run many threads in lockstep, so that hundreds of cycles pass before a thread needs to advance to the next instruction after a fetch, giving the memory time to transfer the data). The Larrabee 1 design has 4-way hyperthreading in its cores to hide L1 cache misses, and you're supposed to also run up to 10 soft fibers on each HT context to make good use of it. The Xbox 360 has 3 cores with HT. The PPC chip in the Cell is hyperthreaded. HT "cores" are not marketing cores.
By brodie_geers
#330072
micheloupatrick wrote:OK I did some testing with the Benchwell scene on my system (see signature), and here are the results for SL15 :

HT ON :
5m13s - Benchmark 2048

HT OFF :
6m54s - Benchmark 1550

So the difference is less then I thought, but significant anyway.
That sounds more reasonable. I'd love to see some more tests one this. I have a single threaded quad core but have wondered if it would be beneficial going to an i7 of similar ghz. If HT helps even 30% that's quite an improvement.

-Brodie
User avatar
By max3d
#330077
Mihnea Balta wrote:Reusing execution units while a thread is waiting for data to arrive from memory pays off. HT didn't do much back when it was first introduced in Pentium 4 chips because the memory was fast enough, but latency has improved very little since then, while execution units have become significantly faster. That's why HT works today.

The same principle is used in lots of current architectures. GPUs do it too, but in a slightly different way (they run many threads in lockstep, so that hundreds of cycles pass before a thread needs to advance to the next instruction after a fetch, giving the memory time to transfer the data). The Larrabee 1 design has 4-way hyperthreading in its cores to hide L1 cache misses, and you're supposed to also run up to 10 soft fibers on each HT context to make good use of it. The Xbox 360 has 3 cores with HT. The PPC chip in the Cell is hyperthreaded. HT "cores" are not marketing cores.
Hi Mihnea,

I know, but I didn´t expect Maxwell to be memory transfer limited. DDR solved by the way a lot and most apps are not even close to using the theoretical DDR3 bandwith. I always made sure that I went for minimal latency instead of transfer speed, but the nett effect on rendering engines has not been spectacular.

I call HT ´cores´ marketing cores as they are now deliberately used to cloud people´s perception. I know why they are there and why Intel introduced them and that made good sense. I think I explained in an earlier post. For compute intensive tasks which are well optimized the advantage of modern HT is totally not comparable to real cores. You understand, I understand, but lots of people (as you can see earlier in this thread) mistake x cores physical chips with full computing units with HT ´cores´.

No renderer in which I have been involved with has anywhere near a 90-95% speedup, so that´s still a mystery to me. L1 misses can´t cause this.
User avatar
By Half Life
#330088
brodie_geers wrote: That sounds more reasonable. I'd love to see some more tests one this. I have a single threaded quad core but have wondered if it would be beneficial going to an i7 of similar ghz. If HT helps even 30% that's quite an improvement.
-Brodie
All the stuff I've seen says not only are the additional cores giving much more speed to render times but the QPI is a nice speed boost as well over previous Intel "core" models... I make no pretense at being a "insider" but the "anecdotal evidence" of actually using the things has impressed me that the money was well spent.

Best,
Jason.
User avatar
By max3d
#330152
micheloupatrick wrote:OK I did some testing with the Benchwell scene on my system (see signature), and here are the results for SL15 :

HT ON :
5m13s - Benchmark 2048

HT OFF :
6m54s - Benchmark 1550

So the difference is less then I thought, but significant anyway.
Thanks for that. I overlooked your post when I posted an earlier comment. This would actually be a performance increase of 20% instead of 30 which others state. That´s within the band with of my original statement of 15% on average for modern renderers. For software I´m involved with I can´t quote figures, but I went through my notes and I compared it with public available benchmarks and I would still say it´s around 15% with Mr at the same level as Maxwell (20%) and most Max bundled renderers apart from Mr at the lower side. It´s of course very scene and hardware dependent. Bottlenecks occur depending on this combination. A silly example: if you run out of memory due to scene and texture size you will get met more effect of HT. However in every normal benchmark this doesn´t happen. However anecdotal results from people with sub par or badly configured systems could also lead to higher effective HT speed.

We´ll have to wait for Dmeyer for a more detailed report about his renderfarm tests.

Max

all these sidelines about Cuda and HT should be in a thread about HW and Maxwell instead of in the real time preview topic. However it developed this way. Maybe it´s best to just let it go and open a new topic about the preview the moment it´s released. Just a suggestion to the mods.
  • 1
  • 11
  • 12
  • 13
  • 14
  • 15
  • 24

So, Apple announced deprecation at the developer c[…]

render engines and Maxwell

I'm talking about arch-viz and architecture as tho[…]

> .\maxwell.exe -benchwell -nowait -priority:[…]