
Number Six is my newer AI server I built about a month ago. I have done a lot of testing and tweaking in that time, but I finally got around to do a more thorough power efficiency test.
The server mainly revolves around the two Nvidia RTX 6000 Pro 600W cards. My initial testing showed only around 4% loss in performance by implementing a power limit of 300W per card. This effectively cut the power ceiling of the cards by 50%, and yielded around 43% power savings. A trade I was more than happy to do.
After some discussion on Twitter, I decided to spend a few and do more thorough testing now I have had time to tweak performance and get my desired model running well.
My daily driver is GLM Air 4.5 FP8 until they get around to releasing 4.6 they promised months ago. I typically see around 95 tokens/sec when just asking a simple question and as much as 195 tokens/sec when doing more complex and agentic tasks.
My testing is for 250W, 300W, 360W, and 600W (stock).
Input token throughput (tok/s): 1071.01
Output token throughput (tok/s): 525.69
Total token throughput (tok/s): 1596.71
Input token throughput (tok/s): 1216.33
Output token throughput (tok/s): 597.02
Total token throughput (tok/s): 1813.35
Request throughput (req/s): 2.46
Input token throughput (tok/s): 1263.23
Output token throughput (tok/s): 620.04
Total token throughput (tok/s): 1883.27
Input token throughput (tok/s): 1274.46
Output token throughput (tok/s): 625.55
Total token throughput (tok/s): 1900.02
These tokens/sec seem high, but this is simulating a multi user workload which will perform considerably better than a single user making one request.
Peak performance is of course at 600W for 625.55 tokens/second with the lowest performance at 250W giving 525.69 tokens/second. When looking at everything, 300W is a clear winner with 597.02 tokens/second.
If you look at the actual power draw, this gets really interesting though.

250W actually uses more power overall, the tests take longer but actually has peak spikes higher than 300W. If you look closely at the graph you can see the 250W test hits as high as 862W where as the 300W test peaked at 821W. The average wattage is fairly similar between these two tests.
| Per-card limit | System power (measured) | Total tok/s | % of max throughput | Output tok/s | Median TTFT | Median ITL | Tokens per Watt | Efficiency vs 600W |
|---|---|---|---|---|---|---|---|---|
| 250 W | 814 W | 1 597 | 84.0 % | 526 | 229.8 s | 20.68 ms | 1.963 | +27 % |
| 300 W | 816 W | 1 813 | 95.4 % | 597 | 201.9 s | 17.79 ms | 2.223 | +44 % |
| 360 W | 990 W | 1 883 | 99.1 % | 620 | 195.5 s | 17.33 ms | 1.902 | +23 % |
| 600 W (max) | 1 229 W | 1 900 | 100 % | 626 | 196.6 s | 17.27 ms | 1.546 | baseline |
| Per-card limit | System power | Power saved vs max | Throughput loss vs max | ||
|---|---|---|---|---|---|
| 300 W per card | 816 W | β34 % | β4.6 % | +44 % | |
| 360 W per card | 990 W | β19 % | β0.9 % | +23 % | |
| 250 W per card | 814 W | β34 % | β16 % | +27 % | |
| 600 W per card (max) | 1 229 W | 0 % | 0 % | baseline |
In reality though, the numbers are even more in favor of 300W, as I was cherry picking the peak wattage specifically. It is interesting that 360W is where you get almost no loss in performance with 99.1% throughput, but with minimal power savings.

https://www.reddit.com/r/LocalLLaMA/comments/1pcerjc/nvidia_rtx_6000_pro_power_efficiency_testing/
This post has been shared on Reddit by @themarkymark through the HivePosh initiative.
Acidyo can you not see how stupid this looks π€π¨π¨π¨π¨π¨π¨π¨π¨π¨π¨
Interesting stuff. I hope it ends up getting you the information that you are looking for.
You could start to open a business to rent space for processing lol
Not at my electricity prices.
Is solar power an option?
View more
Can you automate testing in terms of individual watts between 300 and 360 to see if theres more of an edge somewhere in that continuum?
I'm happy at 300, I think 310 is likely the sweet spot, but it's so minor I don't care.
I know you had/have a Strix Halo that you were testing out but now that I see you have some RTX Pro 6000's Workstation editions, so it got me thinking...
Have you gotten your hands on a DGX Spark (or 2) or like a AMD Instinct?
I know a Spark is not built for fast TPS but it was built for Capacity and mainly for lite development on the go before loading those same workloads onto DGX proper servers. So while say a RTX Pro will blow it out of the water with what like 8 times better bandwidth, it also has double the TDP (not counting 2 RTX Pro's). So I am curious if you have done a true deep dive comparison across multiple GPUs like say cost wise compare 2 Sparks to 1 RTX Pro, or something like that.
I only get to play with DGX H200, B200 and B300's at work (still waiting on my Spark to arrive eventually) and there is no way to compare apples to apples with a DGX B300 to a RTX Pro or Strix Halo lol
The dgx sparks and strix are very similar and the strix is a lot cheaper, almost half.
I hated the strix. It was fun to tinker with but even though I got βgoodβ speeds with gpt oss 120 (50 tokens a second) it was still painfully slow for anything.
Just chatting was ok, say hi it responds back. Anything agentic or even real work you can see the problems with slow prompt processing.
For example I wrote a cli tool called please. So I can βplease stop process on port 8000β and it would ask the llm for the command to do it and allow you to execute it by just pressing one.
Very small context, 1000 tokens at best. When pointing to a cloud api using same model it takes a second or so to get a response. When using strix halo it would take around 10 seconds.
The amount of context used to answer this simple query should be tiny yet it was still 10x slower than using the cloud. This is an incredibly simple task too. Image generation and coding or any other agentic task was so slow it was unusable. I thought about using it just to handle small model for reasoning or other tasks and it just becomes a bottle neck.
I was hoping using a 3090 via egpu would help the prompt processing part but it was just too slow when they were forced to work together.
Honest here, I don't know shit about running servers. I just have a high-end PC that I built myself.
What is the operating temperature of your cards? Heat is also lost energy.
You can see in the final screenshot while the cards were running full tilt. Bottom right corner.
Again I'm going to ask you to leave me alone stop downvoting my original content and now bring back my rep and then I will stop exposing you and your downvoting racist farming friend's
Please just understand I'm not afraid of you in real life or online
It's clear that blocktrades is your master no matter what you say
Why would blocktrades delegate to Buildawhale to comment farm π€ and why would blocktrades delegate to usainvote that also upvotes the buildawhale comment farm π€ most funds get downvoted and sent to alpha which is blocktrades wife's account so please stop π
You might think you are hidden but you are not