The current limitations of GC3 multi-threading

Posted on Monday, July 13, 2015

Hi All,

I've again spent a weekend just investigating GC3 performance on my PC. From OCing my cpu, analyzing page faults, and OCing my video card, I really felt that I hit a brick wall. None of my resources seemed to ever be fully utilized, so that begged the question of what "I" could do to make the game go faster. Using Windows Resource Monitor and a free tool from MS called Windows Performance Analyzer answered the questions for me.

NOTE - analysis is from an insane map game, 100 factions, turn 500 or so. WPA capture was performed during a soak run, turn time about 2 minutes. Save game linked below.

Below is a screen shot of what many of you have seen, making you a bit frustrated that your fancy multi-core CPU isnt being utilized fully. Especially frustrating when you watch this graph, because you have nothing else to do in between the few minutes it can take in between turns in mid/late game. (Note, I do overclock, but not in bios, only in windows once I've launched the game, hence windows doesn't display it properly).


So, using WPA, here are the gory details. At first glance, it again looks like all is well, and for some unknown reason, the CPU is "throttling" or could be doing more.



However, when we look at the per thread performance, we see some interesting observations.



One thread is being capped at or near 12.5% CPU, which means it is running at 100% "core" utilization(this is in fact the Graphics handling thread, discussed below). 2 other threads are quite near capping at 12.5% . All of the other threads don't seem to be doing much at all.  Sure, we see some nice spikes by a few other threads, but they aren't used often enough, as several seconds is an eternity for a cpu.


When looking at statistics of the threads throughout the 2 minute capture period, we can much better see that the 3 GC3 threads are by far doing the vast majority of processing.


What does this mean for you? Well, I wouldn't go rush off to buy an +6 core machine, as its not going to help you very much. I've done a lot of testing, and with all things being equal, I am seeing identical net performance of GC3 if I have all 8 cores enabled, or only 4. Therefore, the game appears to be best suited for 4 core cpus, 3 for GC3, and 1 core can run windows and the puny GC3 threads (a generalization of core us, affinity won't buy you anything, I've tried it). Was this done on purpose, 4 cores is the best? I have no idea. Having more distributed processing of the 3 major threads (what if they were split into 6, albiet this is far from trivial) would have slight impact to those running 4 cores, however a major improvement to those running 8 cores. It certainly needs to be done soon however, as Intel has caught-up/surpassed AMD's Core capacity, and the future is, well, more and more cores. AMD vs Intel debate aside, larger core processors go hand in hand with higher clockspeed , so if you are considering to get a new cpu and you currently have a 4 cores, only the improved clockspeed will really help you out as of today.

Graphics Efficient Improvements Needed:

This is by far the most inefficient software thread in GC3. Its constantly using max resources, whenever you are on the map. I don't know exactly what it is doing, I would however fully expect it to not be doing much when pointed at a blank space in the map. Here are some of my GPU and Graphics thread observations in various scenarios:

                                                                   GPU Util.       Graphics-Thread Core Util.

Staring at a blank uncharted part of the map:   60%             100%
Looking at the global map from far away:         92%            100%
Space combat battle viewer:                            15%             10%
Post battle results screen:                                30%             10%
Intro video:                                                     8%              n/a

Addressing this however will simply only help people that have few cores, below 4. More interestingly, I've seen high GPU and Graphics Thread utilization on tiny maps as well, which leads me to believe something VERY inefficient is going on and this is not a 'matter of scale' for large maps. If optimized, the other two GC3 threads will still be capped by the core limit (I've tried, you can go to a planet or shipyard during your turn, drastically reducing the graphics thread's util.,  but the turn doesn't run faster). Nevertheless, its certainly unfair to those that have below average graphics cards - which the SD recommendations say to be a suggested HW requirement (it is a TURN based game after all).  As well, when a 4th high utilization GC3 thread is eventually developed ( great thing for peeps that have 8 cores), the GC3 graphics thread will cause contention with windows processes and 4 core users will suffer.  

I can't speak on exactly what the AI is doing during the turn, but obviously more needs to be done. Its hard to believe that in between turns, the only thing happening is ship combat/moves. If true, it must be going at the speed of watching them fight/move on the map, which obviously is NOT needed if your current map vantage point doesn't show the enemy ships. The same goes for your own ships. If you can't see them, the game needs to be sure to do this processing much faster than it takes to watch the ship travel 10 hexes. I have no evidence to point to this actually occurring, but I suspect this again due to the long turn times in late game where the AI has a large number of ships (one test could be to just use debug console and destroy everyone's ships and see how long the next turn takes ).

Again, I am a huge fan and major believer in the potential of this game, and by no means discount SD's efforts thus far. I simply wanted to share my observations and thoughts on current limitations and future growth.

Game save that was used for WPA analysis

To make similar observations yourselves, go to control panel, admin tools, performance monitor, and add counters for Thread.%processortime. Much more detailed analysis can be done with the Windows Performance Toolkit, avaiable or free at MS's website. It has a recorder and an analyzer. Watch out, the recorder will create HUGE files, my 2 minute capture was 7GB in size (binary), hence why it is NOT on dropbox!