View Full Version : Any news on Kepler rendering yet?

07-21-2012, 07:42 PM
Has Nvidia posted anything new on a Kepler iray release?

07-22-2012, 04:20 AM
We got a build of iray that supports Keplar late last week. It is integrated and 2012.5 will support Kepler. So far, we havn't had enough seat time to do any real benchmarking so that will come next.

07-24-2012, 03:10 PM
Hi David,
In the absence of hard information on the benchmarks then, can you suggest which might be an excellent Kepler card or cards to pair with a 12 core HP z800?
Can the high-end gamer cards suffice, or is this a "wait for Quadro" moment!

07-25-2012, 11:54 AM
I think that always comes down to your average data, and how much money you have. If you have enough money for a few tesla cards, that might be the way to go. If you only have 1-2k, then go with gamer cards. The x80 is probably best. I have 2 570's, but occasionally a single 580 is faster, which would lead me to believe that it's running out of memory somewhere and is caching the data too many times. I'm not sure of the details, but I have one machine with a single 580 that occasionally renders faster than mine with 2 570's. The 580 has a 3GB version, and the 680 has a 4GB version. Make sure you spend the extra 50-100 bucks on the better version, and avoid any of the overclocked cards.

If you are in the market just Now, I'd probably wait to see some sort of benchmark for the 680 using iRay since it sounds pretty close. I'm guessing once that's figured out, Keppler Quadro and Tesla cards are probably just around the corner. Of course, by just around the corner I mean early next year. nVidia isn't the fastest at releasing hardware (probably not a bad thing) and since you Also have to wait for a manufacturer to make their own version of the card, that slows things down. You'll get PNY's version, ASUS, EVGA, etc. They all take time to develop and test their own card with their own subtle tweaks.

07-25-2012, 05:29 PM
Waller - If you can get a GTX 680 with 4GB of VRAM, that would be the the way to go right now. As it stands, the 680 is about 30% faster than the 580 and if you can get one more GB of VRAM, that's a win win. There is also the 680 out but although it is advertised as having 4GB of VRAM, it really has 2GB per GPU.

07-25-2012, 11:10 PM
Thanks David and Andy--
I guess I should have parsed my question a tad differently: is there any compelling reason to go Quadro rather than high-end gamer, such as GTX680 with a boatload of ram. I have always had Quadros, tho never the cutting-edge versions and they were always the "safe" choice...My current Quadro, the FX1800 is not the quickest and I have to rely on CPU-only renders which even on my 12 core box are not impressive.

David, you seem to be coming down on the "fast-and-loose" side (LOL) and Andy seems to be playing it safe. I will have to re-pose this question with my SW vendor as well...so I'm just trying to get a leg-up on Kepler!
Thanks, and any other thoughts would be welcome.

07-26-2012, 02:15 AM
LOL...i was assuming you had to make a decision quickly based on your question. If time is on your side...then wait it out and see what happens with the Quadros. I hear the replacement for the Quadro 5000 which is based on the same chip as the GTX680 should be arriving fairly soon.

07-26-2012, 12:41 PM
Hey David, the 680's 4GB version is split into 2? I thought that was the 690. I haven't dug too much into it, but if it compares to the 580, then there would be 2 versions, one with only more memory. Am I wrong?
A quick check on newegg Looks like you just pay a premium for more memory. Everything appears to be the same other than 4GB of memory.

I was just wondering if that was a typo and you meant the 690, or if they did something different with the 600 series cards. When I upgrade, I'll probably want a couple 680's with 4GB of memory, but it's not work the extra money if that actually ends up splitting the memory between it's cores.

I say go quadro, only if your workplace requires it, or if you want to put more than a couple cards in, and Then why not Tesla? Gaming cards require much better cooling and use more power. I think that's because gamer geeks like to brag, so they push the cards to the limit. I kind of wonder if you'd get similar results using the software monitor to take the speed Down to what it is set on the quadro cards instead of using it for overclocking though. I don't like to touch any of the settings. Too much at risk. Even if you want 2 gaming cards, you'll have to upgrade your PSU. In fact, be careful. If you purchased a lower end workstation, you Might even need to upgrade your PSU with a single 680. I don't think it would hurt much, but we've had a few 470's here and they didn't upgrade the PSU at first and noticed those machines were slower at rendering. We ran the monitor and noticed they were running at only 50%. I think that's a fail safe built in or something if it's not getting enough power?

One nice thing is you can get one that's a bit more efficient, and quieter as well. Both are good reasons to upgrade. Dell, HP, Lenovo...they only need to make sure you have enough power from day 1, and I still argue with them on that. The HP I have at home (workstation) only has a 300watt power supply in it, but after doing the math, the minimum recommended should be 330. I don't know if that's why, but it gets the BSOD All the time when I'm using the video card heavily.

07-26-2012, 10:55 PM
David/ Andy--
What gain over my Quadro FX1800 would a Kepler Quadro provide...approximately...like in multiples?

07-27-2012, 01:25 PM
Can an FX1800 even render? I thought the minimum requirements were 1GB of memory, no?

Well, just past 4 times the memory means you can render a decent sized scene on your GPU.
If you are rendering currently with your CPU, here's some easy math that is pretty close to accurate. I'll use my machine as an example:

If I have to render on my 3470 Mhz 12thread processor I can multiply those two numbers to get 41,640. My 700 Mhz 480thread (cuda cores) video card comes out to 336,000. Rendering with two of those video cards is Actually 16x faster. It isn't Quite linear, but you can show the math that way to get a Really close idea of how much faster you can expect things.

For Now though, we can't say quite the same thing about Kepler. We'll have to wait to see if it's the same. But Dave said the Kepler cards are roughly 3x faster? If that's the case, and I had the same model video cards then it would be 48x faster than my single high end processor at roughly $1000 upgrade.

OK, so assuming your 1800 can render in GPU mode, you should see nearly no increase in speed over your CPU if your clock speed is somewhere around 700 Mhz. For some reason that's not in the specs on nVidia's site. 700x48= 33600. CPU - 3400x12= 40800.
Depending on your processor, they might be pretty darn close. But of course, your processor can use all of your system memory, so right now with the 1800 you should probably render with CPU.
With a $250 upgrade on a current card you would see a night and day difference. If you swap for 2 new GTX 680's (and probably a new PSU) for around 1400 total, you will be blown away.

Remind your boss also, that not only is rendering that much faster, but since you are working in real-time rendering quality will be higher as well because you can see the results faster. You can't put a number on that.

07-27-2012, 02:16 PM
LOL (I am the boss, which makes justifications all but useless)...but thanks!
Yes I render in CPU only mode (12 Core x 2.44GHz + 16GB Ram) so I tend to time-limit my renders to around 15 minutes.
It would certainly not be a stretch to conservatively (I hate that word) imagine that 1 minute's time could deliver the same or better render quality.
And your point about real-time is stunningly important.
I do believe that my CAD vendor is skewed towards the "Pro" (Quadro) versions, but I will look deeper into that issue, as I would dearly like to be blown away with 2 GTX680s!

07-27-2012, 04:51 PM
Lucky! Being the boss is the Best. And looking at those numbers, you Will be asking yourself, "Why didn't I have this before?"

If you are using proE with some sort of Mech module or something, then it requires the Quadro cards.

You could probably just pick up a $200 card over at a big box store (so you can return it for only wasting $20) and see how your CAD software runs. At my last job I secretly through in a gaming card and it ran everything just fine other than one little overlay issue on occasion. The CAD vendors will Always tell you to get the Quadro cards. I've never talked to anyone at Autodesk that didn't run a Quadro or Fire workstation. That doesn't mean they don't work just as good at what you use it for. There's usually some obscure thing that they work better on. It use to be much larger of a gap. Maya, for example, ran an overlay feature that only the quadros had. So brushes and pointers that worked on the 3D object would flicker if you didn't have a quadro. To Alias, that meant that it didn't work. For us students, that meant that it flickered. lol

I haven't seen problems like that in quite some time with the entertainment stuff and I have to take our engineers word for it when they say they need the quadro cards.

07-27-2012, 09:34 PM
David, the GTX680 is only 30% faster than a 580?
That disappoints me a little ... In gross figures, although the procedure in the 680 chip is different and slower than in the 580 chip, the number of core cuda is 3x higher, their frequency is nearly 1.5 x more elevated and finally in pure computing speed (Tflops), it's doubled!

How can we explain such a difference?

I think I will keep my Triple GTX580 3Gb during a little while... Or wait fo a GTX690 8Gb, or GTX 780 (???) with the GK110 ;)

07-28-2012, 05:20 AM
Yes, this can easily be explained....the 680 uses the "small" kepler chip, where the 580 used the "large" fermi chip. So, techinically, comparing a 680 to a 580 isn't fair as they arn't apples to apples. Also, you cannot compare a Fermi CUDA core linearly to a Kepler CUDA core. Frequency and counts are higher with Fermi, but power consumption is waaaaay less with Kepler so the end performance isn't the same. Power per watt is waaaaay up though. Again, to compare apples to apples, we will have to wait until the "large" Kepler chip is released.

Its really quite amazing actually that a Kepler small chip is already faster than the fermi big chip though. That is a good indication for when the big chip comes out...

07-30-2012, 02:09 PM
I wonder if you can get away with putting 4 Kepler cards in 1 computer with less power and less concern for heat then?

And Dave, I still didn't see anything about the 680's 4GB being split up. Were you referring to the 690? Because it looks like there is a 4GB version of the 680 for an extra bill, which would Definitely be worth the Franklin.

07-30-2012, 09:09 PM
Andy - There are rumors that a 4GB of dedicated memory 680 is available or will be very soon. The 690 has always shipped with 4GB but its not dedicated. It is 2GB per GPU for a total of 4.

Make sense?

07-31-2012, 12:18 PM
NewEgg compare (http://www.newegg.com/Product/Productcompare.aspx?Submit=ENE&N=100007709%20600030348%20600007787&IsNodeId=1&bop=And&ShowDeactivatedMark=False&CompareItemList=48%7C14-130-785%5E14-130-785-TS%2C14-130-794%5E14-130-794-TS%2C14-130-799%5E14-130-799-TS%2C14-130-798%5E14-130-798-TS%2C14-130-801%5E14-130-801-TS) (hopefully that link doesn't expire)

Other than that last one being out of stock, I'd say that takes it well past rumor. Those have actually been on their site for a little while now. Of course, they might only Appear in stock, but newegg is usually pretty good about that.
But it also looks like there is a 4GB 670 if you are looking to save a bill, or if you need a slightly shorter card. I left of the Galaxy cards. I use evga cards or pny cards. I've had best luck with the evga ones. Actually, come to think of it, the only cards I've ever had fail are the PNY Quadro cards, which are suppose to be more durable. Support was good though, and they took care of it without questions of warranty.

Those are definitely the ones I'd aim for. My choice would probably be the slowest 680. I know it seems odd to choose the slowest, but you won't really get That much extra performance gain by only a few Mhz, but you might get noticeably more heat output. All of these cards hover right around 1Ghz though, and that's significantly faster than my 600Mhz.

I'm pretty excited to see some actual benchmarks with these in comparison to the current Quadro/Tesla cards, and with the GTX 580. It's pretty easy to show everyone why you want to spend something like $1300, or so, if I can put your render machine's cards into upgrading 1 or 2 other computers, then upgrade that to make the renders significantly faster. But it's nice to know How significant that might be. 30% is pretty good. 10% would be getting a bit weak.

08-02-2012, 09:49 AM
Thank you for the clarification David. Indeed we have to compare what is comparable.
I thought that by comparing the raw numbers we could get an idea of ​​performance, but this is not the case if the technology, as you said, is not the same.
There are currently GTX 680s with 4Gb, as the Zotac, EVGA, Gainward or Galaxy.

On temperature and consumption Andy, I'm not sure it is a very big problem with Kepler slightly overclocked because techno is thinner and more efficient. And for the "calculation" of the onboard memory is very simple: simply divide the total memory by the number of graphics chips: 1 or 2 :)

Currently the Asus Mars III would be the most powerful with 2xGTX680 overclocked (or a GTX 690 overclocked) and 8Gb of memory (2x4GB actually) but it is an limited edition and expensive. Although compared with a Tesla ...

In all cases, the mere fact that Bunkspeed 2012.5 can accept technology Kepler is good news! ;)

08-02-2012, 12:02 PM
Just to pitch in my 2cents,

I have a GTX 580 card in a custom built computer and I am able to run Solidworks just fine. The occlusion in SW "only" works with workstation cards but a small registry tweak took care of that.


08-03-2012, 11:11 AM
My thoughts on temperature still apply. These new cards don't run as hot, and are more power efficient, but they also set the clock speed almost twice as fast. Running a clock faster than it was designed for without the proper upgrades in cooling is not a good idea, especially if you run more than one card. Overclocking Usually runs a processor to it's max. Adding another card adds more environmental heat, so now you might be running Past the "max" tested temperature.
That may not be the case, but I wouldn't recommend the risk for such a small increase in speed and an added cost.

08-06-2012, 06:19 PM
I understand you Andy but I think Nvidia could afford a higher frequency clock for the fine engraving is not the same (40nm in Fermi against 28nm in Kepler). And operating temperature resulting in Kepler is lower than Fermi and we just afford to elevate the clock speed (frequency) without damaging the graphics chip.
Here my "render tower" have three GTX 580 3Gb and it is true that the outlet temperature is very high. Much more than a CPU in fact. But my cooling tower is good and it removes a lot of calories. I think with the Kepler there is always this concern but in somewhat smaller proportions. Or not more than Fermi.

08-07-2012, 11:46 AM
True, I may actually try and build a tower with 4 680's in it. But since I'll be packing 4 in there, I will want to be cautious of the cooling, so won't be using overclocked cards. That's all.
Once I see enough info on the cards for cooling then we'll learn a bit more, but for now it's just a bunch of gamers bragging (and some lying through their teeth) about how much they overclocked it and how many frames per second they can get on the newest war game with the settings all turned up.
With 4 sets of fans going also, the noise might be a concern so I want those running as slow as possible without having to upgrade the cooling method. Sure, ideally I'd put in a liquid cooling system, but those are expensive, and risky since they aren't usually built for more than 2 cards at once.
I'll have to see what motherboards will accept as well. Right now I have a PCIe SSD taking up a valuable spot, and a USB3 card taking up another. So I'm stuck with only 2 cards in my system for now.And I only have 2 570's.
Once iRay is upgraded in Bunkspeed and Max to accept Kepler, I'll be requesting an upgrade.

08-07-2012, 03:44 PM
Every little bit of fps gain you can get on BF3 is worth the risk. ;)

08-07-2012, 04:30 PM
Andy if you use a giant tower like the HAF932 with 4 fans in it, it works VERY well to keep your equipment cool.

08-07-2012, 06:40 PM
Yes, sure.
For example mine's is a Xigmatech Elysium (very big and heavy...) which have a MSi Big Bang Marshal motherboard (8 slots GPU), a I7 2700K, 32Gb RAM and 3x Gainward GTX 580 Phantom with 3Gb VRAM and double PSU (2x 850W) to run all of this without problem.
The total of fans is...21! It's a little bit noisy and hot but it works very well!
In fact I wanted firstly 4x GTX 580 but the Gainward cards are too big (big cooler with 3 fans) and each take 2,5 slots (so 3 slots...) ;)

08-07-2012, 07:00 PM
Artem, Can you put 4 dual slot cards in there though? I'm only counting space for 3 unless there's something I'm not aware of.

08-07-2012, 08:04 PM
You know Andy, you might be right. You might need something even bigger than that monster.

08-08-2012, 02:08 PM
Has anyone tried the version 2012.5 with a card gtx 6xx kepler?

08-21-2012, 09:58 AM
Bunkspeed Shot 2012.5.1.2 "Trial" GPU only (NO background) - 500 pass - res. 960*540
Bunkspeed Graffiti Benchmark

ZOTAC GTX 560TI - 2GB - 384 Cuda

ASUS GTX 670 - 2GB -1344 Cuda

08-21-2012, 11:55 AM
My old gtx470 with 448 Cuda cores did it in 129 secs. Guess I won't be buying a 4GB 670 after all.

08-21-2012, 02:42 PM
David Randle said GTX680 is about 30% faster than GTX580.
Thus, perf. GTX680 = 130% perf. GTX580.

GTX680 has 1536 cores, GTX670 has 1344.
Thus, #cores GTX670 = 87,5% #cores GTX680. Clock speeds and other specs of both cards are fairly similar. So let's assume perf. GTX670 = 87,5% perf. GTX680. (This figure is consistent with the gaming benchmarks out there.)

Final conclusion:
perf. GTX670 = 87,5% * 130% perf. GTX580 = 113,75% perf. GTX580

Am I right on this one?


08-21-2012, 03:35 PM
No. Look at lele's numbers. The 670 did 125sec with 1344cores.
I only have 960 cores with my 2 570s and Mine completed in 61sec.

Cores aside, it appears that the 670 is almost exactly the same speed as the 570, if 2 cards scale linearly. And in my experiences they almost do.

If we get a few more inputs on this we could probably make a formula for figuring out a way to numerically compare keppler and fermi cards.

08-21-2012, 04:45 PM
Or it must be that 2012.5 is still far from being optimized for Keppler.
I would still be happy though, if GTX670 would reach GTX570/580 performance. Since it draws A LOT less power and produces less heat, which should be great for multi GPU systems.


08-21-2012, 04:50 PM
But if you're upgrading a current system and only planning on getting one card, I'd probably just save the money on a Fermi card. the 3GB 580 would be a good choice.

09-30-2012, 09:03 PM
Yes, this can easily be explained....the 680 uses the "small" kepler chip, where the 580 used the "large" fermi chip. So, techinically, comparing a 680 to a 580 isn't fair as they arn't apples to apples. Also, you cannot compare a Fermi CUDA core linearly to a Kepler CUDA core. Frequency and counts are higher with Fermi, but power consumption is waaaaay less with Kepler so the end performance isn't the same. Power per watt is waaaaay up though. Again, to compare apples to apples, we will have to wait until the "large" Kepler chip is released.

Its really quite amazing actually that a Kepler small chip is already faster than the fermi big chip though. That is a good indication for when the big chip comes out...

Just swapped my 2x570 cards to 2x680 cards and I'm actually seeing a drop in performance (increased render times). The decrease in performance on average seems to be about 10% (10% increase in render times). Is there something I am missing?

I've set power management to performance mode, and as always, SLI is disabled. All cuda cores are in use, and afterburner is showing roughly 99% GPU usage on both cards. I am using 306.23 - WHQL Nvidia drivers.

09-30-2012, 09:35 PM
Mmmmh... Bunkspeed is working in simple or double precision? It's a 64bit, in any case :confused:
Because it could be a explanation... See this graphic (by experience, GTX570 is closed to a 480, in Bunkspeed anyways):


The GK104 chip of the GTX 670/680 is optimized for simple precision, not double (it's the role of the GK110). So, maybe...

09-30-2012, 10:28 PM
From what I've read online from a few sources, iray uses single precision, not double precision. I should see an increase in performance theoretically, but real world I am seeing a decline.

This is very puzzling as Bunkspeed staff have reassured me that I would see some kind of improvement from 2x570's to 2x680's. I'm just not experiencing it.

10-01-2012, 10:45 PM
I'm thinking it has to do with the boost settings of the card. WOuld be great if I could get some guidance for how to maximize the performance of the 680 cards. Anyone know?

10-02-2012, 01:24 AM
We're going to have to let the community be your best support on this...we are only provided, test and bench mark Quadro and Tesla cards.

10-04-2012, 04:10 PM
Hi Devett, where do I get the grafitti project file from?

EDIT: Found it. Will have results soon.

10-04-2012, 04:22 PM
How do I open the benchmark? I don't see any bif files.

EDIT: For some reason the extension of the file was changed to .zip Renamed the extension to .bif

10-04-2012, 04:35 PM
Blitz, can you run the graffiti benchmark at the standard 960x540@500 passes and let us know how long it takes.
I ran this with a 2x GTX580 setup and it took 42 seconds. I have purchased 2 680's but have not opened them yet over concern with your results.


Stock clock time: 63 seconds

Overclocked time (+120% power, +100MHz GPU clock, +300MHz Memory offsets): 58 seconds

Seems like 2x580's are faster even with the 680's mildly overclocked.

10-04-2012, 04:51 PM
Mmmh not good! Have you tried it in high resolution (4000x2250 by example)?
Because sometimes the difference is really big.

10-04-2012, 04:55 PM
I can do one in high res. Devett, can you render 4000x2250 @500 samples? We can compare again.

10-05-2012, 07:39 PM
I'll have results at higher resolution next week.

I'm in the middle of a project that is time sensitive.

10-10-2012, 01:56 PM
I have two of Gigabyte GTX 680 2GB OC edition. Below are Bukspeed benchmark 960x540 @ 500 in GPU only mode
1x GTX 680 = 128 sec
2x GTX 680 = 65 sec (SLI disable)
Hybrid mode with Intel 3770k @ 4.6 GHz = 65 sec (Ivy bridge is useless here)

10-19-2012, 06:30 PM
4000x2250@500 w/ 2xGTX680's at stock clock took 15:57. Not good. Wish I had bought 2x580's instead. :(

10-20-2012, 08:44 AM
Yes indeed, it's not fantastic... But at the moment the renderers are not optimized for the Kepler architecture. Iray, Bunkspeed, Octane have the same problem.
Kepler is accepted, but not fully used as I understood.

Even if it's a shame for now, I think it is an investment for the future...

10-22-2012, 04:47 PM
I ran the benchmark on my single ASUS DCII GTX670 this afternoon, also went as far as i felt comfortable with regarding overclocking.

System specs:

Bunkspeed 2012.5.4.8
Nvidia drivers 306.97
I7 3820 and 16GB RAM

Graffiti bench as mentioned here in default resulution: 125 seconds (with a stopwatch so give or take a few sec)

FPS about 4 on stock clocks: 980/6000 stock boost, reported clocks 1138 mhz (158Mhz 'kepler boost') and 6000 on the memory
FPS about 4,55 on overclocked settings: reported clocks 1300 mhz on the core and 6500 Mhz on the memory.

Overclocking the memory gave nill improvement in FPS. FPS scales quasi-lineair with the core clock.
I suppose the core could clock a little bit higher to lets say 1350 but temperatures were hard to control already at 1300 mhz, hovering just below the 70 degree C mark which is the
first 'thermal-throttle' threshold level. (2nd TThrottle kicks in at 80C).

Hope it adds to this thread :).

10-23-2012, 04:44 PM
Hi Quintin...

Not sure if you knew but overclocking the memory too far can actually produce a drop in performance (worse than stock clocks) from articles I read.

I really hope the i-ray engine improves for the 600 series of cards. My dual 570's beat my dual 680's by at least 10-20%.

10-23-2012, 08:42 PM
Hi Blitz,

I know about the memory. Though at 1300 mhz core clock (which is basically maxed out), there is no FPS difference between stock memory clock and overclocked.
Rendering on Kepler will get where it needs to be over time. I can imagine an investment like dual 680's can be hard to bear when it's slower than your previous setup. Though adding a third one
is easier in terms of heat management and power management :).

10-24-2012, 12:30 PM
well your 680's are slower, but you don't have to worry Nearly as much about memory management.

10-25-2012, 09:23 PM
Hi Everyone. I'd like to add the perspective from NVIDIA to this thread.

First, please rest assured that Bunkspeed has implemented Iray properly and it’s currently running with as high of performance as is possible for the rendering it’s being asked to deliver. So the issue of improving that performance falls on NVIDIA, for in the end it’s our rendering software, our hardware, and our drivers.

The first thing I’d like to clarify is how to judge GPU expectations. Unlike the CPU, each of our GPU generations is a considerable architectural change as the GPU is evolving at a much faster pace. Within each generation (Fermi, Kepler, etc.) you can get a good estimate of Iray performance by comparing the number of CUDA Cores and the Core/Graphics Clock. You cannot use this metric when comparing between generations as the impact of a “Core” or “MHz” is different. For example, based upon their specifications it’s easy to estimate the gain of a GTX 580 vs. a GTX 560, or of a GTX 680 versus a GTX 660, but not between a 580 and a 680.

Next, one should understand that the GTX 580 used our largest Fermi GPU. People loved its performance but not necessarily its power needs. We thus designed our first Kepler products for maximum graphics performance and to use the least possible power – and it delivered, with a GTX 680 well out performing a GTX 580 while using much less power. But while killer gaming cards, these Kepler cards do have less ability to boost the type of compute done in ray tracing (Iray). Your Iray performance per watt is likely better on these cards, but your card vs. card performance might be a bit less (although you may be able to fit more cards in the same machine now, so your performance per rig could easily be higher). But this is not a step backwards….

What surprises many is that our current Kepler GTX products are using a GPU that is smaller than what our top end Fermi products did, and all the comparisons between 600’s and 500’s is akin to having welterweights fight heavyweights. The “big” Kepler products have not yet shipped (or even launched), and you will be finding some very compelling options coming for Iray in the not too distant future. When you hear of us “optimizing for Kepler” it’s primarily for these upcoming products because these will be the ideal options for creative professionals wanting to render the biggest of scenes as fast as possible.

In the meantime, we continue to advance Iray capabilities while keeping a vigilant eye on what it’s doing to performance, and we continue to look for ways of improving performance at every turn.

- Phil

Director, Advanced Rendering Product Management, NVIDIA

10-26-2012, 06:46 PM
Phil, it's good to hear some info from Nvidia.

What are the "big" kepler products. The current announced K5000 looks to be lower power than the 680, and has a slightly newer OGL, but everything else seems noticeably slower. Am I missing something?

I still have yet to be ever blown away by a Quadro product. And if I have to pay a 1000% increase for a product, I better be impressed. If I compare it to a car, it seems like I'm going from a Carolla to a Prius. It's pretty much the same car, but it's more efficient, and a few details look a bit nicer. But the prius is only a 140% markup. With 1000% shouldn't I see something more like a bycicle to a ducati?

10-26-2012, 09:14 PM
Hi Andy,

The only "big" Kepler announced so for is the Tesla K20. The Quadro K5000 uses a Kepler that is similar to the GTX 680, but uses less power to accommodate workstation configurations, and less power reduces performance.

Comparisons between Quadro and GeForce products should not be done on performance alone, as Quadro delivers much more than GeForce, especially when it comes to OpenGL graphics, multi-displays, windowed stereo, sync, double precession speed, ECC memory, etc., along with all the professional application certifications.

If you’re interested solely in single precession computing speed (what Iray uses) then GTX is going to be faster than Quadro because game cards push the power envelopes and can have faster clocks. Your consideration (after memory size) should then be on longevity, as gaming cards aren’t built to be punished 24/7. This is what our Tesla line is designed for, as they supply data centers and super computers – which is akin to a render farm. For pure rendering, both GTX and Tesla are good choices - just make sure to purchase some spare GTX's for downtime if you're running them constantly :).

- Phil

10-29-2012, 11:18 AM
Good to know. We use Rhino, Adobe software, 3dsMax, and bunkspeed. So that's probably why we haven't noticed any difference. I think one guy runs ProE on a GTX as well without complaints. But I don't know much on that end. From what I understand ProE works just fine, but a few features don't.

I keep hearing that the GTX cards don't last as long as quadros do, but in my many years of using nVidia products I've burned through 3 quadro cards in 5 years time, and have never had a GTX fail in 8 years time. Of course, I make sure we don't have overclocked cards. I've also not had problems with dual displays.

So I guess GTX cards are right for me. But every time I stop by an nVidia booth at a show they swear that I will see better performance from quadro cards and they seem surprised I can do Anything more than game with any other card. They must just be sales guys and not much more.

10-29-2012, 06:38 PM
I'm glad that GTX is working well for you.

My comment about GeForce longevity was in regards to constant rendering or other data center usage. For this type of abuse the boards are just not as robust (e.g., Consumer cards puts heat into the box while Pro cards exit heat out the back).

My comment about multi-monitor was in professional use cases - not merely 2 or 3 monitors, but things like 4K displays, synched displays, bezel overlap removal, etc.

Trade shows are manned by business units, and their information isn't always global. For example, you won't get any Quadro/Tesla information at GDC.

- Phil

10-29-2012, 08:09 PM
Using the Bunkspeed Graffiti Benchmark, "Side View" camera, render to 250 passes @ 960x540 viewport:

2:58 (roughly 1.45 FPS average) CPU mode - Intel Core i7 3960X (Benchmarked using RealTemp at 3.6 GHz.)
0:41 (roughly 6.0 FPS average) GPU mode (Quadro 600, 4000, Tesla = 800 CUDA cores)
0:38 (roughly 6.6 FPS average) Hybrid mode

I run dual SyncMaster 24's and a Wacom Cintiq 21ux tablet monitor. The Quadro 600 shares the load of the Cintiq IIRC.

10-30-2012, 07:02 PM
Thank you Mr Miller for all these details. :)

I'm glad to hear an expert opinion on the subject.
But there are some questions that seem essential to me and probably others:

- First of all, what can we expect in terms of current Kepler about the speed (we expect that almost exclusively + memory size) in the future? Because I believe (I fear) have realized that there is not much to be hoped for Kepler 6XX...

- Then, on the problems of stability and bug Iray, is that certain types of cards? CUDA drivers? or internal development of software?

Thank you for your response and availability


10-31-2012, 07:40 PM
Hi Nicolas,

I believe I've given out most of the information necessary for estimating single GPU performance for shipping products. We obviously can’t comment on future products until they’re announced. All I can say is that bigger and faster are on their way – but that’s nearly a constant status at NV .

Bugs should be logged with Bunkspeed who will follow up with NVIDIA as required (although they know most answers on their own!). You mention stability, and we take both Iray crashes driver issues very seriously, and we want to know if you’re experiencing such problems. When reporting driver or Iray stability problems, please take the time to include the steps to reproduce it, and include info on your GPU(s) and driver version. As with most software, if you can reproduce it, we can fix it.

If you’re experiencing unexplained slowdowns with multiple GPUs, then Bunkspeed will likely want to know your motherboard information as well (e.g., “Sandybridge with 2 CPU sockets, using one hex-core CPU”).

- Phil

11-01-2012, 06:54 PM
The more I read the more frustrating this gets. I'm looking to spec two new high end workstations by the end of Nov while there are funds available but what to go with? Current machines are a fleet of 1.5 and 3 gig 580s, a couple of single 680s and one 2x 590 box.
Do we go k5000 and a K10 (not hearing anything conclusive on this card) or wait for a K20?
Has anyone tried one or two 690's in a box?
Sounds like maybe pairing up multiple 680 cards is causing problems? Is that still the case?


11-02-2012, 02:11 PM
It sounds like unless you use the features of the quadro, if you aren't adding more than 2 cards, and you're not rendering 24/7 on the machine, then you're probably best off going with the high end gamer cards. Maybe I'm hearing wrong, but that's what I'm understanding.

From my experience I was always disappointed with the quadro cards because it takes months for the performance drivers to come out. By that time, a new software may have been released. When I was using that along with 3dsMax I had to stay 1 release behind if I wanted to use the performance drivers. So you get better graphics performance, on a software with overall less performance? If you upgrade to every version almost right away then quadro cards don't seem to make sense to me. I'd rather by 2 gamer cards and replace them on every release for less amount that a quadro. If you're an environmentalist, then maybe go quadro just for the power. Then again, I've broken my quadro cards, and have yet to break a gamer card. That's probably just bad luck though.

11-17-2012, 03:09 PM
I know this is all in the realm of theory, particularly since iray isn't even optimized for Kepler yet, but I was curious if anyone could tell me about predicted differences between the GK110 GPU and the GK104. They are both Kepler, but the K10 has two GK104 GPUs for 3072 cores and the K20 has a GK110 with 2496 cores.

Is it likely that Shot will utilize the additional features of the GK110, or is it more likely that the K10 will be faster than a K20?

11-19-2012, 06:46 AM
@kdrk1 - We have no way to predict that performance yet. Based on my understanding however, measurable performance gains of iray on Kepler are supposed to come with the GK110 chips where performance on the GK104 chips was never meant to surpass the Fermi generation Big Chip.

11-19-2012, 04:31 PM
Well, I do not want to pour oil on the fire again... (I hope that makes sense in english...:confused:)
But finally, even though I understand the need to compare what is comparable (Small Kepler vs Small Fermi and Big Kepler vs Big Fermi) and not the CUDA cores number, their frequency, etc., because the architecture is completely different,
the fact that the cards are more efficient with an egal consumption etc.

I still do not understand that:
Overall performance for all video games and the number of operations/sec (Tflops) increased while in the same time GPU-computing has decreased (image calculating in any cases).

Or the current Kepler architecture starts from zero (or near) Fermi compared to a new basis for future generations?
Nvidia is like any Automotive firm which wanted to cleanup the old engines to produced a new engine less efficient for some applications but much less greedy?
(I say that because it remind me reminds me a bit Mercedes which around 2000 had under-grade (in pure power) for all big engines to pass new pollution standards, before recreate new engine more efficient. :p

11-19-2012, 05:10 PM
Well if you compare it to cars then a good analogy is probably towing. Both cars have no problems driving 150mph. Yeah, the time it takes to get there is slightly different, but that's fine in most cases. But rendering is more like towing. It takes brute force to get things going, and the more efficient car is pretty terrible at it. In fact, you'll probably ruin the transmission. What you need is a new transmission with the newer engine.
I think that's what we're waiting on. They've got the top speed going, but it's like it's still running on an old transmission so it can't pull the same load quite as fast.

From what I've been reading it's not just software that will handle it either. I'm still looking at the 680's though just so I don't have overheating problems. I'd probably just get those, then when the new ones come out sell them and get the new ones.

11-19-2012, 05:40 PM
Thanks Andy for the analogy. I was thinking to something close ... I understand a little bit more.
It's like talking about Power and Torque isn't it?

01-04-2013, 02:17 PM
How do I know how many cuda cores running at the time the render?
And how to include all cores, if some of them did not work at the moment?

01-04-2013, 09:03 PM
If using the GPU, we automatically use all available CUDA cores and video cards. You cannot limit iray to only use some of your GPUs.

01-09-2013, 06:13 PM
Since updating nvidia drivers to 310.90, I have seen a small improvement in render times with my 2x680's... Here they are compared to my last test:

Grafiti Benchmark - 960x540@500 passes - Stock Clocks:

Old Driver: 63 seconds 310.90 Driver: 59 seconds Change: +6.3% speed improvement

Grafiti Benchmark - 960x540@500 passes - Overclocked (120% power, +100MHz GPU clock, +300MHz MEM clock):

Old Driver: 58 seconds 310.90 Driver: 55 seconds Change: +5.2% speed improvement

01-09-2013, 09:34 PM
We just ran benchmarks on the 310.90 drivers on Quardo and Tesla Kepler cards and found improvements all around. We see an average of about 10% better performance across the boards over previous drivers that had ~ 40% less performance than previous generation cards.

Looks like nvidia didn't take a holiday like the rest of us....So, we have officially certified Quadro/Tesla Kepler cards when used with the 310.90 driver as they are now getting better performance then any similar spec Fermi series cards.

01-10-2013, 08:12 AM
I'm not seeing any difference with my single 670 it still takes 127 secs for the test scene, even with the new driver.

01-30-2013, 08:49 AM
So I've compiled most bench scores in the hadrware section

seems the fastest we can do is 2x GTX580 followed by 2x GTX680, the 680 combination still has the advantage of less power usage/heat and more RAM (4GB 680 cards are available, 3GB 580 are becoming rare)

i'm interested to see 590 vs 690 standalone and dual benchs, which i'm sure they go at the top

I'm also interested in Quadro5000+C2075 and K5000+K20

For K5000 i saw conflicting reports, i'm not sure the 106s score as reported by 3D world is true

BunkSpeed Grafiti Benchmark 960x540 @ 500
2x GTX580, 1024 cores = 42s
2x GTX680, 3072 cores = 63s, 59s
2xGTX570, 960 cores = 61s
GTX470 + GTX670, 1792 cores = 70s
4000+C2075, 704 cores = 72s
GTX 580, 512 cores = 80s
K5000 + GTX660, 2880 cores = 81s
K5000, 1536 cores = 185s, 175s, 106s
GTX 680, 1536 cores = 116s, 109s
GTX 670, 1344 cores = 126s, 125s
GTX 570, 480 cores = 120s
GTX 470, 448 cores = 129s, 127s
GTX 660, 1344 cores = 142s
GTX 560, 384 cores = 175s
5010m, 384 cores = 197s

BunkSpeed Grafiti Benchmark 4000x2250@500
2x GTX580, 1024 cores = 11:28
2x GTX680, 3072 cores = 15:57

01-30-2013, 11:18 AM
Octane Render team's official current response about kepler is 60-70% the performance of Fermi, but migratiing to Cuda 5 will result in better speeds

"Our first priority now is getting instancing done. This will also take a lot of work, so there will be again a period with fewer releases. We are expecting to be able to make the first 2.6 release with instancing support around late summer on the northern hemisphere. If it turns out to be too time-consuming we may do another release before 2.6 with some other features, like support for the new Kepler GPU's. We can't put a time frame on that one though."

"The performance for now:
On Kepler cards the performance is better than the previous Kepler build, but not yet on par with the Fermi cards. We are expecting we can further improve this performance later.
On Fermi cards the performance is still below the performance of our normal CUDA 4.0 build."

"For most scenes the GTX680 with this build seems to render at around 60 to 70% of the speed of the GTX580 with the normal CUDA 4.0 builds."

Cuda 5 allows developers to take advantage of the Kepler's dynamic parallelism ability and should improve their compute performance



I think ill close my eyes and go ahead with 2x GTX680 4GB :)

01-30-2013, 01:00 PM
Thank you HamdiR for the "Benchspeed" sum-up. I was thinking to do it without finding time...
I brought my contribution to this, all tested in GPU mode for sure:

Bunkspeed Graffiti Benchmark 960x540 @ 500 passes

3x GTX580, 1536 cores = 30s
2x GTX580, 1024 cores = 42s
2x GTX680, 3072 cores = 63s, 59s
2xGTX570, 960 cores = 61s
GTX470 + GTX670, 1792 cores = 70s
4000+C2075, 704 cores = 72s
GTX 580, 512 cores = 80s
K5000 + GTX660, 2880 cores = 81s
GTX 480, 480 cores = 97s
K5000, 1536 cores = 185s, 175s, 106s (big gap no?)
GTX 680, 1536 cores = 116s, 109s
GTX 670, 1344 cores = 126s, 125s
GTX 570, 480 cores = 120s, 93s (big gap no?)
GTX 470, 448 cores = 129s, 127s
GTX 660, 1344 cores = 142s
GTX 560, 384 cores = 175s
5010m, 384 cores = 197s

Bunkspeed Graffiti Benchmark 4000x2250@500 passes

3x GTX580, 1536 cores = 7:36
2x GTX580, 1024 cores = 11:28
2x GTX680, 3072 cores = 15:57
1x GTX570, 480 cores = 22:57
1x GT480, 480 cores = 23:44

I've mentionned "big gap" sometimes, but may be it come from the evolution of Iray, which is faster now (2.0 < 3.0) as I understood... :o Plus the drivers, the PC usage during the render etc...

01-30-2013, 01:18 PM
I've mentionned "big gap" sometimes, but may be it come from the evolution of Iray, which is faster now (2.0 < 3.0) as I understood... :o

yes your 1x 570 score of 93s makes more sense since 1x 580 = 80s

but I based the 120fps on the reported 2x 570 = 61s, which might mean this score was on early version or limited by other factors, I guess it should score better

as for the k5000 its the most confusing, best report around here was 175s by lele, which is considerably less than the GTX680, the 106s came from the 3D World's K5000 review although they didn't mention which exact bunkspeed test i assumed its the same since its similar to the GTX680 scores, but come to think about it, it might also be hybrid mode

We ran the same Bunkspeed CUDA-enhanced rendering test as we did for Boston’s Tesla-powered Venom 2300-7T. The test scene took 154 seconds with the CPU alone – almost the same as the Venom – which fell to 106 seconds with the K5000 helping out. However, the Venom took 72 seconds with the Tesla and Quadro 4000, implying that the K5000’s modelling abilities aren’t quite as stunning for CUDA-powered rendering.


If others with the confusing card can fill us in it would be great, maybe in the latest 310 drivers k5000 is doing much better?

i would also appreciate it if someone posts 5000+C2075 because i'm trying to decide between these or 2x680

Based on these score it safe to assume 3x 680 would score in the 39s range

Finally i have a question, can we mix a Tesla with a GTX? say GTX680 (to gain modelling speed) + C2075 for fermi compute speed? (I would gain 4GB in dual mode for this combination)

02-01-2013, 07:32 AM
Anyone considering to buy a card now should wait

First Alleged Performance Result of Nvidia GeForce GTX 780 “Titan” Leaks to Web. (http://www.xbitlabs.com/news/graphics/display/20130131204031_First_Alleged_Performance_Result_of _Nvidia_GeForce_GTX_780_Titan_Leaks_to_Web.html)

02-05-2013, 04:57 PM
Hello, all

We have 2 server blade systems that are set up as a render farm for our university. I have done a bit of testing with PowerBoost and a two different (pairs of) Nvidia boards: M2090 and K20. These are not graphics workstations -- onboard VGA is used with VNC access for administration, etc. The software was identical, PowerBoost 2012.6, Nvidia drivers 310.90 and Windows Server 2008 R2. The host hardware was not identical & might undermine the times below as a direct comparison -- the K20s (which are on loan from Dell) were installed with a dual Xeon X5650 blade; the M2090s are associated with a newer dual Xeon E5-2650 blade. Hopefully, we will have a chance to try the K20s in the newer chassis for a true comparison. Until then...

Bunkspeed Graffiti Benchmark 960x540 @ 500 passes

2x K20 = 72s
2x M2090 = 53s

Bunkspeed Graffiti Benchmark 4000x2250@500 passes

2x K20 = 19:33
2x M2090 = 12:30


02-13-2013, 11:51 AM
thanks scott, for now this is a very disappointing performance out of the K20...

a new gefroce based on the K20 with faster clocks is about to be released

Nvidia GeForce Titan Launches February 18th (http://www.brightsideofnews.com/news/2013/2/13/nvidia-geforce-titan-launches-february-18th2c-2013-loses-to-gtx-6902c-amd-hd-7990.aspx)

08-10-2013, 09:45 PM
Hi All,

So are the 680 cards still ~70% the speed of the 580? My 580 just died and I need to purchase a new card. Probably don't want to spend for the Titan, but a 680 would do. I just hate to buy a new card that is SLOWER than my old one. Any new benchmarks with current Nvidia drivers that show comparisons with 580?

08-11-2013, 05:02 PM
Hi amoncur. I haven't benchmark but as I understood, on CUDA app, 680 is roughly 80-90% the speed of the 580 ;)

08-12-2013, 12:05 AM
Hi Aselert, thanks so much for the response. That's a bummer that the 680 is still slower than the 580. So at this point the only card that is a bump up in performance without being significantly more expensive than the 580 was is the 780, correct? And the 780 won't even work with 2012 products, will it?

08-12-2013, 07:39 AM
Mmmh I think the 780 will work with the 2012 products because it's a "baby" Titan, but David or Brian will have a better answer about this.

My best advice, without breaks the bank is to buy a 770, because it's a 680 plus a soft overclock. So the speed will be roughly the same as the 580 (maybe 95%) and with 4Gb.


08-12-2013, 02:08 PM
Cool - I didn't realize there was a 4GB version of the 770. Thanks, Aselert.

08-13-2013, 03:34 AM
FYI for anyone interested I just received an email from support stating that the 680 is ~10% faster than the 580.

08-13-2013, 07:24 AM
Thanks amoncur! So I think the 770 would be very close to the 580 ;)

08-15-2013, 07:19 PM
Hi guys -

We don't have too much benchmarking done with the new 700 series GTX cards, but I can tell you that they should be way faster and will not work with Bunkspeed 2012.6. Reason being is that these new cards have upgraded internal architecture and it won't play nice with the older 2012.6 version.

I will see if I can test more 700 series cards and post the results in this thread.


08-20-2013, 07:40 PM
Hi Brian,

Have you had a chance to test the 700 series cards yet? Curious to know your results.

08-21-2013, 02:05 PM
FYI for anyone interested I just received an email from support stating that the 680 is ~10% faster than the 580.

Is this with 2012 or 2014 version of bunkspeed?

08-21-2013, 02:11 PM
I have the 780 and have not had any issues with the newest Bunk release. I've not benchmarked anything but if you'd like me to let me know. I have a single 780 atm and will get a duel once the power of the keplers have been unleashed.

08-21-2013, 09:48 PM
Hi Brian,

Have you had a chance to test the 700 series cards yet? Curious to know your results.

We are in the process of receiving more 700 series cards from Nvidia and I will keep you posted with the results.

08-22-2013, 07:15 PM
I've been using a 770. I have had some issues but I am not sure if they're related to the card or just buggy software.

1. My "fast" mode is slower than accurate mode (at least based on the fps count on bottom of screen)
2. Some of my materials did not work (lots of normals issues). Recreating the materials from scratch seemed to work.

I haven't done any actual benchmarking yet, either, but the 770 does seem slower than my old 580.

08-23-2013, 11:14 PM
@amoncur - The FPS for Accurate mode and Fast mode are calculated quite differently. On a vehicle file that is 750,000 polys, 14FPS in Accurate mode is really about 3 FPS in Fast mode. You should go by the visual lack of noise, rather than the FPS, to subjectively determine the speed of the rendering, and likewise when the rendering is complete.

Thus, due to the nature of different modes of raytracing within iray, the FPS in each other rendering modes will not be consistent.

Try opening the "Bunkspeed Graffiti Benchmark.bif" from the Community to benchmark your 770. I would be interested in seeing your results.

08-24-2013, 12:54 AM
I see, thanks for the explanation on fps. Regarding the graffiti benchmark, here are my results:

1920 x 1080 inline: 5:37
1920 x 1080 background: 5:31 (I did not close Pro, but it was paused)
960 x 540 realtime fps at 1000 passes: 5.98

I have a 6-core i7 running at 3.5GHz and 24 GB RAM. Everything was run in hybrid mode.

How do these compare with 580 or 680 cards?

08-24-2013, 02:30 PM
I see, thanks for the explanation on fps. Regarding the graffiti benchmark, here are my results:

1920 x 1080 inline: 5:37
1920 x 1080 background: 5:31 (I did not close Pro, but it was paused)
960 x 540 realtime fps at 1000 passes: 5.98

I have a 6-core i7 running at 3.5GHz and 24 GB RAM. Everything was run in hybrid mode.

How do these compare with 580 or 680 cards?

Render in the background at 4000x2250@500. We can compare with render times posted previously in this thread.

08-25-2013, 05:03 PM
Hi amoncur, to have a good base to compare, it would be better to retest (with the graffiti benchmark, for sure) but without CPU, so in GPU mode only, instead of Hybrid ;)

08-28-2013, 01:27 AM

Rendered in background showing in viewport, jpeg, 4000 x 2250, 500 passes.

Pretty disappointing. Even Aselert's 480 did better than that!

Brian, is this about what you expected?

08-30-2013, 12:57 PM
I'm really sorry for you Amoncur, because on other CUDA rendering app, such as Bunkspeed, the 770 (like the 680) is very very close to the 580...

Here the last Benchspeed Updated:

Bunkspeed Graffiti Benchmark 960x540 @ 500 passes

3x GTX580 = 30s
2x GTX580 = 42s
2x M2090 = 53s
2x GTX680 = 63s, 59s
2x GTX570 = 61s
GTX470 + GTX670 = 70s
2x K20 = 72s
4000 + C2075 = 72s
GTX 580 = 80s
K5000 + GTX660 = 81s
GTX 570 = 93s
GTX 480 = 97s
GTX 680 = 109s
GTX 470 = 127s
GTX 670 = 125s
GTX 660 = 142s
GTX 560 = 175s
K5000 = 175s
5010M = 197s

Bunkspeed Graffiti Benchmark 4000x2250@500 passes

3x GTX580 = 7:36
2x GTX580 = 11:28
2x M2090 = 12:30
2x GTX680 = 15:57
2x K20 = 19:33
1x GTX570 = 22:57
1x GT480 = 23:44
1x GTX770 = 26:19

08-30-2013, 03:06 PM
Hi Aselert,

Are you saying that the 770 should be performing better than it did in my benchmark test?

08-30-2013, 03:47 PM
Yes! :rolleyes:

08-30-2013, 03:47 PM
You think the card is bad?

08-30-2013, 03:54 PM
No, your card is following the same bad result as the 680. I was in expectancy that the last update change the situation, but no :( Just I'm still very surprised by Iray...
Because Iray is a Nvidia property, and logically should have the best optimization/performance by cards because Nvidia built the software (with Mental Images) AND the hardware.
And finally, not!