EGGH04: SIGGRAPH/Eurographics Workshop on Graphics Hardware 2004
Permanent URI for this collection
Browse
Browsing EGGH04: SIGGRAPH/Eurographics Workshop on Graphics Hardware 2004 by Subject "Graphics processors"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Hardware-based Simulation and Collision Detection for Large Particle Systems(The Eurographics Association, 2004) Kolb, A.; Latta, L.; Rezk-Salama, C.; Tomas Akenine-Moeller and Michael McCoolParticle systems have long been recognized as an essential building block for detail-rich and lively visual environments. Current implementations can handle up to 10,000 particles in real-time simulations and are mostly limited by the transfer of particle data from the main processor to the graphics hardware (GPU) for rendering. This paper introduces a full GPU implementation using fragment shaders of both the simulation and rendering of a dynamically-growing particle system. Such an implementation can render up to 1 million particles in real-time on recent hardware. The massively parallel simulation handles collision detection and reaction of particles with objects for arbitrary shape. The collision detection is based on depth maps that represent the outer shape of an object. The depth maps store distance values and normal vectors for collision reaction. Using a special texturebased indexing technique to represent normal vectors, standard 8-bit textures can be used to describe the complete depth map data. Alternately, several depth maps can be stored in one floating point texture. In addition, a GPU-based parallel sorting algorithm is introduced that can be used to perform a depth sorting of the particles for correct alpha blending.Item Realtime Ray Tracing of Dynamic Scenes on an FPGA Chip(The Eurographics Association, 2004) Schmittler, Jörg; Woop, Sven; Wagner, Daniel; Paul, Wolfgang J.; Slusallek, Philipp; Tomas Akenine-Moeller and Michael McCoolRealtime ray tracing has recently established itself as a possible alternative to the current rasterization approach for interactive 3D graphics. However, the performance of existing software implementations is still severely limited by today's CPUs, requiring many CPUs for achieving realtime performance. In this paper we present a prototype implementation of the full ray tracing pipeline on a single FPGA chip. Running at only 90 MHz it achieves realtime frame rates of 20 to 60 frames per second over a wide range of 3D scenes and includes support for texturing, multiple light sources, and multiple levels of reflection or transparency. A particular interesting feature of the design is the re-use of the transformation unit necessary for supporting dynamic scenes also for other tasks, including efficient ray-triangle intersection as well as shading computations. Despite the additional support for dynamic scenes this approach reduces the overall hardware cost by 68 %. We evaluate the design and its implementation across a wide set of example scenes and demonstrate the benefits of dedicated realtime ray tracing hardware.Item Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication(The Eurographics Association, 2004) Fatahalian, K.; Sugerman, J.; Hanrahan, P.; Tomas Akenine-Moeller and Michael McCoolUtilizing graphics hardware for general purpose numerical computations has become a topic of considerable interest. The implementation of streaming algorithms, typified by highly parallel computations with little reuse of input data, has been widely explored on GPUs. We relax the streaming model's constraint on input reuse and perform an in-depth analysis of dense matrix-matrix multiplication, which reuses each element of input matrices O(n) times. Its regular data access pattern and highly parallel computational requirements suggest matrix-matrix multiplication as an obvious candidate for efficient evaluation on GPUs but, surprisingly we find even nearoptimal GPU implementations are pronouncedly less efficient than current cache-aware CPU approaches. We find the key cause of this inefficiency is that the GPU can fetch less data and yet execute more arithmetic operations per clock than the CPU when both are operating out of their closest caches. The lack of high bandwidth access to cached data will impair the performance of GPU implementations of any computation featuring significant input reuse.