Today’s high computational throughput probably would not be attainable without the application of the SIMD paradigm in modern processors in increasingly clever ways. It’s no coincidence that GPUs also gain most of their performance, die area, and efficiency benefits thanks to this instruction issue scheme. In this article we will explore a couple of examples of how GPUs may take advantage of SIMD and the implications of those on the programming model.
Multisampling is a well-understood technique used in computer graphics that enables applications to efficiently reduce geometry aliasing, yet not everybody is familiar with the entire toolset offered by modern GPU hardware to control multisampling behavior. In this article we present the behavior of basic multisampling and explore a set of controls that enable us to tune performance/quality trade-offs and open doors for more advanced rendering techniques.
The behavior of the graphics pipeline is practically standard across platforms and APIs, yet GPU vendors come up with unique solutions to accelerate it, the two major architecture types being tile-based and immediate-mode rendering GPUs. In this article we explore how they work, present their strengths/weaknesses, and discuss some of the implications the underlying GPU architecture may have on the efficiency of certain rendering algorithms.
Previously we explored the different types of memories available for access by the GPU, but only barely touched on the topic of caches. In this article we will make up for that by taking a look at the different caches available on modern GPUs to appreciate their role in the system. Having thorough understanding of GPU cache behavior enables developers to better utilize them and thus improve the performance of their graphics or compute applications.
With the recent announcement of AMD Smart Access Memory it seemed to be the right time to write about the different types of memories available to be used by applications targeting dedicated GPUs. This article aims to provide an introduction to different memory pools within such a system, their access characteristics, and why enabling access to the entire VRAM through the PCI-Express bus could be a game changer.