As the primary component responsible for our gaming experience, buying a graphics card is without a doubt the most important decision a gamer can make as it will have the most significant effect on how well they enjoy their games. With a vast number of aspects such as core count, clock speed, CUDA and memory bandwidth all proving to have an impact on which graphics card you may choose to buy, it’s important to understand what each of these things mean and whether or not they’ll be of major importance to the level of performance you’ll receive.
[Skill Level: I’m Too Young To Die]
The GPU (graphical processing unit) commonly interchanged with graphics card, is the processor of the graphics card. The graphics card refers to the entire construction – that being the GPU, PCB and cooler.
The architecture of a GPU provides meaning to the platform/technology it’s built upon. As the architecture of each GPU generation will either be a refinement of the prior or an entirely new platform altogether, it stands to be highly important in regards to the performance and capabilities of the GPU. Advancements in GPU architecture allow for greater power efficiency, thermal output, core counts and clock speeds. It’s because of the advancements in GPU architecture as to how graphics cards are able to increase in performance yet decrease in power requirements and operating temperatures. This brings us to the topic of Shader Core differences and why graphics cards are not directly comparable despite similar or identical specifications.
Known as CUDA Cores on Nvidia GPUs and Stream Processors to AMD, Shader Cores refer to the number of processing cores that’s available on a GPU. As with traditional CPUs, the more cores available, the greater the amount of data that’s able to be processed. Depending on the overall design, efficiency and architecture of the graphics card in question, having more cores can also lead to variations and stability in clock speed, as well as thermal output and power requirements.
As different generations of graphics cards will have drastically different GPU architectures from one another, core counts cannot be directly compared in order to gain an understanding of each graphics card’s potential performance. Where the Nvidia GTX 780 contains 2304 cores, and the GTX 980 features 2048, one might expect the GTX 780 to deliver superior results due to its core advantage. However, this is not the case. Due to the architectural advantages as well as the higher processing speed of each core on the GTX 980, it performs significantly better, along with overall specifications being that much lower than the GTX 780.
These aspects of core count differences also ring true when comparing Nvidia and AMD graphics cards to one another directly, as the most recent approaches that AMD has taken to the design of its GPUs has been in favour of more cores at a lower frequency. This can be seen on the AMD RX 580 and the Nvidia GTX 1060 (6GB). Where the RX 580 features additional cores by the way of 2304 operating at an average speed of 1393MHz, the GTX 1060 rivals and in most scenarios excels ahead, with 1280 cores running at 1600MHz – average. As the overall specification of the GTX 1060 is significantly lower than the RX 580, the efficiency, design and advancements in GPU architecture provide far more benefits than to that of its competition, allowing it to pump well above its weight.
Also known as the core speed or frequency, this is the speed at which the GPU core calculates data. Typically, the higher the frequency, the faster the GPU can process the data. But as the architectural design of each graphics card generation can vary, this will not always be an equal comparison. As stated previously in the Shader Cores section, cores operating at faster speeds on one GPU have the potential to outweigh a higher-spec’d GPU regardless of an additional core count, memory speed, TFLOP rating and so fourth – but this will be dependent on the game in-question. As with traditional CPUs, graphics cards also feature a boost clock frequency which allows the GPU cores to run at a higher speed so long as the thermal threshold hasn’t been reached and there’s enough power to supply the required increase in frequency.
Video Memory (VRAM)
VRAM is the amount of available memory that’s solely available to the GPU. Unlike traditional system memory which operates at slower speeds and is designed to be used for system processes as well as storing temporary data for specific assets and game code, video memory functions in a very different manner. As VRAM is exclusive for the purposes of the GPU, it behaves accordingly storing game assets such as textures, anti-aliasing and resolution data. As these aspects of image quality will scale up and down according to each user’s personal preferences as well as the game or application in question, so to will the amount of VRAM being used. For instance, enabling high-resolution textures in a your latest Triple-A game may only consume 2GB of VRAM on a 1080p display.
Making the same decision on a 4K display which is roughly 4X the increase in resolution may only increase this consumption of VRAM to 3GB or 4GB. This will be game-specific and how well smartly manages the amount of VRAM that’s available to it. As games continue to increase in fidelity and detail, most graphics cards will be fitted with an amount of VRAM that’s practical to its purpose and performance capabilities for the display resolution and in-game image settings of each game release. Mid-Range graphics card such as the GTX 1060(6GB) and the RX 570 feature 6GB and 4GB of VRAM respectively. This proves adequate for high-resolution textures and anti-aliasing when gaming at maximum image quality in 1080p as well as 1440p resolutions.
Since other aspects of game data may require VRAM for functionality, should the user decide to dedicate this memory to superior image quality settings or higher resolutions which the card hasn’t been truly intended for, this may result in performance decreases. This is where the issue of overall specifications come into play, as a graphics card with 4GB or 8GB of VRAM may be deemed suitable for 4K gaming, but if the surrounding specifications such as core count, processing speed and memory bandwidth fail to keep up, the GPU will run out of “Horsepower” before it even gets the chance of utilizing the full amount of VRAM. For those with multi-monitor set-ups or gaming at extremely demanding resolution such as 4K, they’ll find more satisfaction with a graphics card equipped with 4GB of VRAM or higher, within the Mid-Range to High-End performance tier.
Memory Clock & Memory Design
The speed of the graphics card’s memory is commonly referred to as memory clock or memory speed. How fast this memory speed can operate will be dependent on the type of memory the graphics card is fitted with and how efficiently the game in-question will use it to its advantage. Where the grand majority of video cards feature GDDR5 memory, lower-tier variants built for the purposes of general computing for everyday users or those requiring additional displays may feature slower DDR3 memory. Where GDDR5 has an average speed ranging from 5000MHz to 8000MHz, GDDR5X takes things up to 11000MHz, while the slower DDR3 reduces this to a mere 2000MHz. As games are highly dependent upon streaming various game assets such as textures and geometry in and out of VRAM at an incredibly fast rate, memory speed is very important as to avoid slow-downs, stuttering or visual pop-ins of characters and objects within the game’s world.
While GDDR5 is considered the standard for GPU memory, its successor will be determined by the mass implementation of GDDR5X and HBM (High Bandwidth Memory). GRDD5X functions much in the same way as GDDR5 – memory chips fitted around the PCB of the graphics card, communicating with the GPU core itself by the means of traces and circuitry. HBM takes a drastically different approach to this method, by stacking the physical memory chips on-top of one another directly to the GPU itself – communicating via the means of an interposer connection. In short, this allows for a physically smaller PCB, lower heat output and far less power requirements for maximum functionality. The biggest benefit from HBM technology comes from the extremely wide throughput of data which greatly exceeds GDDR5. For more information on HBM technology check out our guide to Radeon graphics cards.
Memory Bandwidth & Memory Bus
Commonly thought of as the amount of available lanes for transporting data, memory bandwidth works in tandem with memory speed and the memory bus in order to use data as efficiently as possible. Where memory bus is traditionally measured as width, this scales by the number as 192-bit, 256-bit, and 384-bit lanes. While a memory-bus containing a width of 384-bit is generally considered to be superior over 256-bit since it allows for a wider throughput of data, the memory speed and memory bandwidth must scale accordingly in order to maximize the potential of its additional width. The best way to think of memory bus and memory bandwidth is to look towards a motorway (highway for the Americans). a 4-lane motorway (memory bus) allows for more vehicles (memory bandwidth) to move through it.
This is also where memory speed comes into play, as a 2-lane motorway (memory bus) with lesser vehicles (memory bandwidth) can move at a faster rate if the speed limit (memory speed) is higher than that of the 4-lane motorway. Luckily, each variation and decision of GPU memory implementation is down to the manufacturer as well as the capabilities of the GPU itself. For instance, where a GPU of a 256-bit memory bus may have bandwidth rated at 224GB/s, another GPU featuring the same 256-bit bus may have an extremely wide memory bandwidth of 320GB/s. This remains true for alternative variants such as those with a 384-bit memory bus scaling in bandwidth from 336GB/s to 547GB/s.
As the most precise and simplistic measurement of performance for the capabilities of a processor, a Teraflop refers to the processor being able to calculate one trillion floating-point operations per second. Using the GTX 970 as an example, the GPU is rated for a peak measurement of 4TFLOPs. This means it is able to perform calculations at 4 trillion floating-point operations per second. While this does have its uses in gathering an estimated performance level for how well a graphics card will perform, it should be noted that a higher TFLOP rating doesn’t always result in greater performance than a graphics card of a lesser TFLOP rating. This, as you would’ve no-doubt guessed, is due to the differences in GPU architectures.
Covered many times throughout this article, the GPU architecture and the efficiency of how well a GPU operates will affect have the greatest effect on performance, regardless of how it’s rated on-paper. And it’s because of architectural differences why we see lower-spec graphics cards outperform their competition of a higher-spec. This has been the case with a great number of lower-spec Nvidia GPUs rivaling or surpassing AMD graphics cards released in the same competitive time frame. With those such as the GTX 1060(6GB) at 3.9TFLOPs, pulling ahead of the highest-spec RX 580 at 6.6TFLOPs, the importance of GPU architecture will immediately mitigate the value of TFLOP measurements between different cards. This was also the case with AMD’s 8.5TFLOP R9 Fury X being surpassed by Nvidia’s 5.9TFLOP GTX 980Ti. For more information of how TFLOPs can affect performance and real-world gaming scenarios, check out our Combo-Breaker build feature.
This guide has been written so that those who are new to PC gaming know exactly what they should be looking for in regards to performance, as well as gaining an understanding of PC hardware.
For a full range of graphics card by both AMD and Nvidia check out the Dino PC graphics card section.