High-Performance Graphics 2013
Permanent URI for this collection
Browse
Browsing High-Performance Graphics 2013 by Issue Date
Now showing 1 - 16 of 16
Results Per Page
Sort Options
Item Real-Time High-Resolution Sparse Voxelization with Application to Image-Based Modeling(ACM, 2013) Loop, Charles; Zhang, Cha; Zhang, Zhengyou; Kayvon Fatahalian and Christian TheobaltWe present a system for real-time, high-resolution, sparse voxelization of an image-based surface model. Our approach consists of a coarse-to-fine voxel representation and a collection of parallel processing steps. Voxels are stored as a list of unsigned integer triples. An oracle kernel decides, for each voxel in parallel, whether to keep or cull its voxel from the list based on an image consistency criterion of its projection across cameras. After a prefix sum scan, kept voxels are subdivided and the process repeats until projected voxels are pixel size. These voxels are drawn to a render target and shaded as a weighted combination of their projections into a set of calibrated RGB images. We apply this technique to the problem of smooth visual hull reconstruction of human subjects based on a set of live image streams. We demonstrate that human upper body shapes can be reconstructed to giga voxel resolution at greater than 30 fps on modern graphics hardware.Item Preface and Table of Contents(ACM, 2013) Kayvon Fatahalian and Christian TheobaltItem An Energy and Bandwidth Efficient Ray Tracing Architecture(ACM, 2013) Kopta, Daniel; Shkurko, Konstantin; Spjut, Josef; Brunvand, Erik; Davis, Al; Kayvon Fatahalian and Christian TheobaltWe propose two hardware mechanisms to decrease energy consumption on massively parallel graphics processors for ray tracing while keeping performance high. First, we use a streaming data model and configure part of the L2 cache into a ray stream memory to enable efficient data processing through ray reordering. This increases the L1 hit rate and reduces off-chip memory accesses substantially. Second, we employ reconfigurable specialpurpose pipelines than are constructed dynamically under program control. These pipelines use shared execution units (XUs) that can be configured to support the common compute kernels that are the foundation of the ray tracing algorithm, such as acceleration structure traversal and triangle intersection. This reduces the overhead incurred by memory and register accesses. These two synergistic features yield a ray tracing architecture that significantly reduces both power consumption and off-chip memory traffic when compared to a more traditional cache only approach.Item Fast Parallel Construction of High-Quality Bounding Volume Hierarchies(ACM, 2013) Karras, Tero; Aila, Timo; Kayvon Fatahalian and Christian TheobaltWe propose a new massively parallel algorithm for constructing high-quality bounding volume hierarchies (BVHs) for ray tracing. The algorithm is based on modifying an existing BVH to improve its quality, and executes in linear time at a rate of almost 40M triangles/ sec on NVIDIA GTX Titan. We also propose an improved approach for parallel splitting of triangles prior to tree construction. Averaged over 20 test scenes, the resulting trees offer over 90% of the ray tracing performance of the best offline construction method (SBVH), while previous fast GPU algorithms offer only about 50%. Compared to state-of-the-art, our method offers a significant improvement in the majority of practical workloads that need to construct the BVH for each frame. On the average, it gives the best overall performance when tracing between 7 million and 60 billion rays per frame. This covers most interactive applications, product and architectural design, and even movie rendering.Item Real-time Local Displacement using Dynamic GPU Memory Management(ACM, 2013) Schäfer, Henry; Keinert, Benjamin; Stamminger, Marc; Kayvon Fatahalian and Christian TheobaltWe propose a novel method for local displacement events in large scenes, such as scratches, footsteps, or sculpting operations. Deformations are stored as displacements for vertices generated by hardware tessellation. Adaptive mesh refinement, application of the displacement and all involved memory management happen completely on the GPU. We show various extensions to our approach, such as on-the-fly normal computation and multi-resolution editing. In typical game scenes we perform local deformations at arbitrary positions in far less than one millisecond. This makes the method particularly suited for games and interactive sculpting applications.Item Imperfect Voxelized Shadow Volumes(ACM, 2013) Wyman, Chris; Dai, Zeng; Kayvon Fatahalian and Christian TheobaltVoxelized shadow volumes [Wyman 2011] provide a discretized view-dependent representation of shadow volumes, but are limited to point or directional lights. We extend them to allow dynamic volumetric visibility from area light sources using imperfect shadow volumes. We show a coarser visibility sampling suffices for area lights. Combining this coarser resolution with a parallel shadow volume construction enables interactive rendering of dynamic volumetric shadows from area lights in homogeneous single-scattering media, at under 4x the cost of hard volumetric shadows.Item SGRT: A Mobile GPU Architecture for Real-Time Ray Tracing(ACM, 2013) Lee, Won-Jong; Shin, Youngsam; Lee, Jaedon; Kim, Jin-Woo; Nah, Jae-Ho; Jung, Seokyoon; Lee, Shihwa; Park, Hyun-Sang; Han, Tack-Don; Kayvon Fatahalian and Christian TheobaltRecently, with the increasing demand for photorealistic graphics and the rapid advances in desktop CPUs/GPUs, real-time ray tracing has attracted considerable attention. Unfortunately, ray tracing in the current mobile environment is very difficult because of inadequate computing power, memory bandwidth, and flexibility in mobile GPUs. In this paper, we present a novel mobile GPU architecture called SGRT (Samsung reconfigurable GPU based on Ray Tracing) in which a fast compact hardware accelerator and a flexible programmable shader are combined. SGRT has two key features: 1) an area-efficient parallel pipelined traversal unit; and 2) flexible and high-performance kernels for shading and ray generation. Simulation results show that SGRT is potentially a versatile graphics solution for future application processors as it provides a real-time ray tracing performance at full HD resolution that can compete with that of existing desktop GPU ray tracers. Our system is implemented on an FPGA platform, and mobile ray tracing is successfully demonstrated.Item Lazy Incremental Computation for Efficient Scene Graph Rendering(ACM, 2013) Wörister, Michael; Steinlechner, Harald; Maierhofer, Stefan; Tobler, Robert F.; Kayvon Fatahalian and Christian TheobaltIn order to provide a highly performant rendering system while maintaining a scene graph structure with a high level of abstraction, we introduce improved rendering caches, that can be updated incrementally without any scene graph traversal. The basis of this novel system is the use of a dependency graph, that can be synthesized from the scene graph and links all sources of changes to the affected parts of rendering caches. By using and extending concepts from w incremental computation we minimize the computational overhead for performing the necessary updates due to changes in any inputs. This makes it possible to provide a high-level semantic scene graph, while retaining the opportunity to apply a number of known optimizations to the rendering caches even for dynamic scenes. Our evaluation shows that the resulting rendering system is highly competitive and provides good rendering performance for scenes ranging from completely static geometry all the way to completely dynamic geometry.Item Screen-Space Far-Field Ambient Obscurance(ACM, 2013) Timonen, Ville; Kayvon Fatahalian and Christian TheobaltAmbient obscurance (AO) is an effective approximation of global illumination, and its screen-space (SSAO) versions that operate on depth buffers only are widely used in real-time applications. We present an SSAO method that allows the obscurance effect to be determined from the entire depth buffer for each pixel. Our contribution is two-fold: Firstly, we build an obscurance estimator that accurately converges to ray traced reference results on the same screenspace geometry. Secondly, we generate an intermediate representation of the depth field which, when sampled, gives local peaks of the geometry from the point of view of the receiver. Only a small number of such samples are required to capture AO effects without undersampling artefacts that plague previous methods. Our method is unaffected by the radius of the AO effect or by the complexity of the falloff function and produces results within a few percent of a ray traced screen-space reference at constant real-time frame rates.Item Out-of-Core Construction of Sparse Voxel Octrees(ACM, 2013) Baert, Jeroen; Lagae, Ares; Dutre´, Philip; Kayvon Fatahalian and Christian TheobaltVoxel-based rendering has recently received significant attention due to its potential in the context of efficiently rendering massively large and highly detailed scenes. Unfortunately, few or no scenes are available in the form of sparse voxel octrees. In this paper, we present an out-of-core algorithm for constructing a sparse voxel octree from a triangle mesh. Our algorithm allows the input triangle mesh, the output sparse voxel octree, and, most importantly, the intermediate high-resolution 3D voxel grid, to be larger than available memory. We demonstrate that our out-of-core algorithm can construct sparse voxel octrees from triangle meshes using only a fraction of the memory required by an in-core algorithm in roughly the same time, and that our out-of-core algorithm can also handle extremely large triangle meshes.Item Efficient Divide-And-Conquer Ray Tracing using Ray Sampling(ACM, 2013) Nabata, Kosuke; Iwasaki, Kei; Dobashi, Yoshinori; Nishita, Tomoyuki; Kayvon Fatahalian and Christian TheobaltDivide-and-conquer ray tracing (DACRT) methods solve intersection problems between large numbers of rays and primitives by recursively subdividing the problem size until it can be easily solved. Previous DACRT methods subdivide the intersection problem based on the distribution of primitives only, and do not exploit the distribution of rays, which results in a decrease of the rendering performance especially for high resolution images with antialiasing. We propose an efficient DACRT method that exploits the distribution of rays by sampling the rays to construct an acceleration data structure. To accelerate ray traversals, we have derived a new cost metric which is used to avoid inefficient subdivision of the intersection problem where the number of rays is not sufficiently reduced. Our method accelerates the tracing of many types of rays (primary rays, less coherent secondary rays, random rays for path tracing) by a factor of up to 2 using ray sampling.Item On Quality Metrics of Bounding Volume Hierarchies(ACM, 2013) Aila, Timo; Karras, Tero; Laine, Samuli; Kayvon Fatahalian and Christian TheobaltThe surface area heuristic (SAH) is widely used as a predictor for ray tracing performance, and as a heuristic to guide the construction of spatial acceleration structures. We investigate how well SAH actually predicts ray tracing performance of a bounding volume hierarchy (BVH), observe that this relationship is far from perfect, and then propose two new metrics that together with SAH almost completely explain the measured performance. Our observations shed light on the increasingly common situation that a supposedly good tree construction algorithm produces trees that are slower to trace than expected. We also note that the trees constructed using greedy top-down algorithms are consistently faster to trace than SAH indicates and are also more SIMD-friendly than competing approaches.Item PixelPie: Maximal Poisson-disk Sampling with Rasterization(ACM, 2013) Ip, Cheuk Yiu; Yalc, M. Adil; Luebke, David; Varshney, Amitabh; Kayvon Fatahalian and Christian TheobaltWe present PixelPie, a highly parallel geometric formulation of the Poisson-disk sampling problem on the graphics pipeline. Traditionally, generating a distribution by throwing darts and removing conflicts has been viewed as an inherently sequential process. In this paper, we present an efficient Poisson-disk sampling algorithm that uses rasterization in a highly parallel manner. Our technique is an iterative two step process. The first step of each iteration involves rasterization of random darts at varying depths. The second step involves culling conflicted darts. Successive iterations identify and fill in the empty regions to obtain maximal distributions. Our approach maps well to the parallel and optimized graphics functions on the GPU and can be easily extended to perform importance sampling. Our implementation can generate Poisson-disk samples at the rate of nearly 7 million samples per second on a GeForce GTX 580 and is significantly faster than the state-of-the-art maximal Poisson-disk sampling techniques.Item Megakernels Considered Harmful: Wavefront Path Tracing on GPUs(ACM, 2013) Laine, Samuli; Karras, Tero; Aila, Timo; Kayvon Fatahalian and Christian TheobaltWhen programming for GPUs, simply porting a large CPU program into an equally large GPU kernel is generally not a good approach. Due to SIMT execution model on GPUs, divergence in control flow carries substantial performance penalties, as does high register usage that lessens the latency-hiding capability that is essential for the high-latency, high-bandwidth memory system of a GPU. In this paper, we implement a path tracer on a GPU using a wavefront formulation, avoiding these pitfalls that can be especially prominent when using materials that are expensive to evaluate. We compare our performance against the traditional megakernel approach, and demonstrate that the wavefront formulation is much better suited for realworld use cases where multiple complex materials are present in the scene.Item Theory and Analysis of Higher-Order Motion Blur Rasterization(ACM, 2013) Gribel, Carl Johan; Munkberg, Jacob; Hasselgren, Jon; Akenine-Möller, Tomas; Kayvon Fatahalian and Christian TheobaltA common assumption in motion blur rendering is that the triangle vertices move in straight lines. In this paper, we focus on scenarios where this assumption is no longer valid, such as motion due to fast rotation and other non-linear characteristics. To that end, we present a higher-order representation of vertex motion based on B´ezier curves, which allows for more complex motion paths, and we derive the necessary mathematics for these. In addition, we extend previous work to handle higher-order motion by developing a new tile vs. triangle overlap test. We find that our tile-based rasterizer outperforms all other methods in terms of sample test efficiency, and that our generalization of an interval-based rasterizer is often fastest in terms of wall clock rendering time. In addition, we use our tile test to improve rasterization performance by up to a factor 5 for semi-analytical motion blur renderingItem Efficient BVH Construction via Approximate Agglomerative Clustering(ACM, 2013) Gu, Yan; He, Yong; Fatahalian, Kayvon; Blelloch, Guy; Kayvon Fatahalian and Christian TheobaltWe introduce Approximate Agglomerative Clustering (AAC), an efficient, easily parallelizable algorithm for generating high-quality bounding volume hierarchies using agglomerative clustering. The main idea of AAC is to compute an approximation to the true greedy agglomerative clustering solution by restricting the set of candidates inspected when identifying neighboring geometry in the scene. The result is a simple algorithm that often produces higher quality hierarchies (in terms of subsequent ray tracing cost) than a full sweep SAH build yet executes in less time than the widely used top-down, approximate SAH build algorithm based on binning.