EGPGV12: Eurographics Symposium on Parallel Graphics and Visualization

Permanent URI for this collection


HyperFlow: A Heterogeneous Dataflow Architecture

Vo, Huy T.
Osmari, Daniel K.
Comba, João
Lindstrom, Peter
Silva, Cláudio T.

EAVL: The Extreme-scale Analysis and Visualization Library

Meredith, Jeremy S.
Ahern, Sean
Pugmire, Dave
Sisneros, Robert

PISTON: A Portable Cross-Platform Framework for Data-Parallel Visualization Operators

Lo, Li-ta
Sewell, Christopher
Ahrens, James

A Study of Ray Tracing Large-scale Scientific Data in Two Widely Used Parallel Visualization Applications

Brownlee, Carson
Patchett, John
Lo, Li-Ta
DeMarle, David
Mitchell, Christopher
Ahrens, James
Hansen, Charles D.

Explicit Cache Management for Volume Ray-Casting on Parallel Architectures

Jönsson, Daniel
Ganestam, Per
Doggett, Michael
Ynnerman, Anders
Ropinski, Timo

GLuRay: Enhanced Ray Tracing in Existing Scientific Visualization Applications using OpenGL Interception

Brownlee, Carson
Fogal, Thomas
Hansen, Charles D.

Fast Collision Culling in Large-Scale Environments Using GPU Mapping Function

Avril, Quentin
Gouranton, Valérie
Arnaldi, Bruno

Dynamic Scheduling for Large-Scale Distributed-Memory Ray Tracing

Navrátil, Paul A.
Fussell, Donald S.
Lin, Calvin
Childs, Hank

Polygonization of Implicit Surfaces on Multi-Core Architectures with SIMD Instructions

Shirazian, Pourya
Wyvill, Brian
Duprat, Jean-Luc

Light Propagation Maps on Parallel Graphics Architectures

Gruson, Adrien
Patil, Ajit Hakke
Cozot, Remi
Bouatouch, Kadi
Pattanaik, Sumanta

Parallel Rendering on Hybrid Multi-GPU Clusters

Eilemann, Stefan
Bilgili, Ahmet
Abdellah, Marwan
Hernando, Juan
Makhinya, Maxim
Pajarola, Renato
Schürmann, Felix

Multi-GPU Image-based Visual Hull Rendering

Hauswiesner, Stefan
Khlebnikov, Rostislav
Steinberger, Markus
Straka, Matthias
Reitmayr, Gerhard

Load-Balanced Multi-GPU Ambient Occlusion for Direct Volume Rendering

Ancel, Alexandre
Dischler, Jean-Michel
Mongenet, Catherine

Auto Splats: Dynamic Point Cloud Visualization on the GPU

Preiner, Reinhold
Jeschke, Stefan
Wimmer, Michael

Shift-Based Parallel Image Compositing on InfiniBand TM Fat-Trees

Cavin, Xavier
Demengeon, Olivier

Time-constrained Animation Rendering on Desktop Grids

Aggarwal, Vibhor
Debattista, Kurt
Bashford-Rogers, Thomas
Chalmers, Alan


BibTeX (EGPGV12: Eurographics Symposium on Parallel Graphics and Visualization)
@inproceedings{
:10.2312/EGPGV/EGPGV12/001-010,
booktitle = {
Eurographics Symposium on Parallel Graphics and Visualization},
editor = {
Hank Childs and Torsten Kuhlen and Fabio Marton
}, title = {{
HyperFlow: A Heterogeneous Dataflow Architecture}},
author = {
Vo, Huy T.
and
Osmari, Daniel K.
and
Comba, João
and
Lindstrom, Peter
and
Silva, Cláudio T.
}, year = {
2012},
publisher = {
The Eurographics Association},
ISSN = {1727-348X},
ISBN = {978-3-905674-35-4},
DOI = {
/10.2312/EGPGV/EGPGV12/001-010}
}
@inproceedings{
:10.2312/EGPGV/EGPGV12/021-030,
booktitle = {
Eurographics Symposium on Parallel Graphics and Visualization},
editor = {
Hank Childs and Torsten Kuhlen and Fabio Marton
}, title = {{
EAVL: The Extreme-scale Analysis and Visualization Library}},
author = {
Meredith, Jeremy S.
and
Ahern, Sean
and
Pugmire, Dave
and
Sisneros, Robert
}, year = {
2012},
publisher = {
The Eurographics Association},
ISSN = {1727-348X},
ISBN = {978-3-905674-35-4},
DOI = {
/10.2312/EGPGV/EGPGV12/021-030}
}
@inproceedings{
:10.2312/EGPGV/EGPGV12/011-020,
booktitle = {
Eurographics Symposium on Parallel Graphics and Visualization},
editor = {
Hank Childs and Torsten Kuhlen and Fabio Marton
}, title = {{
PISTON: A Portable Cross-Platform Framework for Data-Parallel Visualization Operators}},
author = {
Lo, Li-ta
and
Sewell, Christopher
and
Ahrens, James
}, year = {
2012},
publisher = {
The Eurographics Association},
ISSN = {1727-348X},
ISBN = {978-3-905674-35-4},
DOI = {
/10.2312/EGPGV/EGPGV12/011-020}
}
@inproceedings{
:10.2312/EGPGV/EGPGV12/051-060,
booktitle = {
Eurographics Symposium on Parallel Graphics and Visualization},
editor = {
Hank Childs and Torsten Kuhlen and Fabio Marton
}, title = {{
A Study of Ray Tracing Large-scale Scientific Data in Two Widely Used Parallel Visualization Applications}},
author = {
Brownlee, Carson
and
Patchett, John
and
Lo, Li-Ta
and
DeMarle, David
and
Mitchell, Christopher
and
Ahrens, James
and
Hansen, Charles D.
}, year = {
2012},
publisher = {
The Eurographics Association},
ISSN = {1727-348X},
ISBN = {978-3-905674-35-4},
DOI = {
/10.2312/EGPGV/EGPGV12/051-060}
}
@inproceedings{
:10.2312/EGPGV/EGPGV12/031-040,
booktitle = {
Eurographics Symposium on Parallel Graphics and Visualization},
editor = {
Hank Childs and Torsten Kuhlen and Fabio Marton
}, title = {{
Explicit Cache Management for Volume Ray-Casting on Parallel Architectures}},
author = {
Jönsson, Daniel
and
Ganestam, Per
and
Doggett, Michael
and
Ynnerman, Anders
and
Ropinski, Timo
}, year = {
2012},
publisher = {
The Eurographics Association},
ISSN = {1727-348X},
ISBN = {978-3-905674-35-4},
DOI = {
/10.2312/EGPGV/EGPGV12/031-040}
}
@inproceedings{
:10.2312/EGPGV/EGPGV12/041-050,
booktitle = {
Eurographics Symposium on Parallel Graphics and Visualization},
editor = {
Hank Childs and Torsten Kuhlen and Fabio Marton
}, title = {{
GLuRay: Enhanced Ray Tracing in Existing Scientific Visualization Applications using OpenGL Interception}},
author = {
Brownlee, Carson
and
Fogal, Thomas
and
Hansen, Charles D.
}, year = {
2012},
publisher = {
The Eurographics Association},
ISSN = {1727-348X},
ISBN = {978-3-905674-35-4},
DOI = {
/10.2312/EGPGV/EGPGV12/041-050}
}
@inproceedings{
:10.2312/EGPGV/EGPGV12/071-080,
booktitle = {
Eurographics Symposium on Parallel Graphics and Visualization},
editor = {
Hank Childs and Torsten Kuhlen and Fabio Marton
}, title = {{
Fast Collision Culling in Large-Scale Environments Using GPU Mapping Function}},
author = {
Avril, Quentin
and
Gouranton, Valérie
and
Arnaldi, Bruno
}, year = {
2012},
publisher = {
The Eurographics Association},
ISSN = {1727-348X},
ISBN = {978-3-905674-35-4},
DOI = {
/10.2312/EGPGV/EGPGV12/071-080}
}
@inproceedings{
:10.2312/EGPGV/EGPGV12/061-070,
booktitle = {
Eurographics Symposium on Parallel Graphics and Visualization},
editor = {
Hank Childs and Torsten Kuhlen and Fabio Marton
}, title = {{
Dynamic Scheduling for Large-Scale Distributed-Memory Ray Tracing}},
author = {
Navrátil, Paul A.
and
Fussell, Donald S.
and
Lin, Calvin
and
Childs, Hank
}, year = {
2012},
publisher = {
The Eurographics Association},
ISSN = {1727-348X},
ISBN = {978-3-905674-35-4},
DOI = {
/10.2312/EGPGV/EGPGV12/061-070}
}
@inproceedings{
:10.2312/EGPGV/EGPGV12/089-098,
booktitle = {
Eurographics Symposium on Parallel Graphics and Visualization},
editor = {
Hank Childs and Torsten Kuhlen and Fabio Marton
}, title = {{
Polygonization of Implicit Surfaces on Multi-Core Architectures with SIMD Instructions}},
author = {
Shirazian, Pourya
and
Wyvill, Brian
and
Duprat, Jean-Luc
}, year = {
2012},
publisher = {
The Eurographics Association},
ISSN = {1727-348X},
ISBN = {978-3-905674-35-4},
DOI = {
/10.2312/EGPGV/EGPGV12/089-098}
}
@inproceedings{
:10.2312/EGPGV/EGPGV12/081-088,
booktitle = {
Eurographics Symposium on Parallel Graphics and Visualization},
editor = {
Hank Childs and Torsten Kuhlen and Fabio Marton
}, title = {{
Light Propagation Maps on Parallel Graphics Architectures}},
author = {
Gruson, Adrien
and
Patil, Ajit Hakke
and
Cozot, Remi
and
Bouatouch, Kadi
and
Pattanaik, Sumanta
}, year = {
2012},
publisher = {
The Eurographics Association},
ISSN = {1727-348X},
ISBN = {978-3-905674-35-4},
DOI = {
/10.2312/EGPGV/EGPGV12/081-088}
}
@inproceedings{
:10.2312/EGPGV/EGPGV12/109-117,
booktitle = {
Eurographics Symposium on Parallel Graphics and Visualization},
editor = {
Hank Childs and Torsten Kuhlen and Fabio Marton
}, title = {{
Parallel Rendering on Hybrid Multi-GPU Clusters}},
author = {
Eilemann, Stefan
and
Bilgili, Ahmet
and
Abdellah, Marwan
and
Hernando, Juan
and
Makhinya, Maxim
and
Pajarola, Renato
and
Schürmann, Felix
}, year = {
2012},
publisher = {
The Eurographics Association},
ISSN = {1727-348X},
ISBN = {978-3-905674-35-4},
DOI = {
/10.2312/EGPGV/EGPGV12/109-117}
}
@inproceedings{
:10.2312/EGPGV/EGPGV12/119-128,
booktitle = {
Eurographics Symposium on Parallel Graphics and Visualization},
editor = {
Hank Childs and Torsten Kuhlen and Fabio Marton
}, title = {{
Multi-GPU Image-based Visual Hull Rendering}},
author = {
Hauswiesner, Stefan
and
Khlebnikov, Rostislav
and
Steinberger, Markus
and
Straka, Matthias
and
Reitmayr, Gerhard
}, year = {
2012},
publisher = {
The Eurographics Association},
ISSN = {1727-348X},
ISBN = {978-3-905674-35-4},
DOI = {
/10.2312/EGPGV/EGPGV12/119-128}
}
@inproceedings{
:10.2312/EGPGV/EGPGV12/099-108,
booktitle = {
Eurographics Symposium on Parallel Graphics and Visualization},
editor = {
Hank Childs and Torsten Kuhlen and Fabio Marton
}, title = {{
Load-Balanced Multi-GPU Ambient Occlusion for Direct Volume Rendering}},
author = {
Ancel, Alexandre
and
Dischler, Jean-Michel
and
Mongenet, Catherine
}, year = {
2012},
publisher = {
The Eurographics Association},
ISSN = {1727-348X},
ISBN = {978-3-905674-35-4},
DOI = {
/10.2312/EGPGV/EGPGV12/099-108}
}
@inproceedings{
:10.2312/EGPGV/EGPGV12/139-148,
booktitle = {
Eurographics Symposium on Parallel Graphics and Visualization},
editor = {
Hank Childs and Torsten Kuhlen and Fabio Marton
}, title = {{
Auto Splats: Dynamic Point Cloud Visualization on the GPU}},
author = {
Preiner, Reinhold
and
Jeschke, Stefan
and
Wimmer, Michael
}, year = {
2012},
publisher = {
The Eurographics Association},
ISSN = {1727-348X},
ISBN = {978-3-905674-35-4},
DOI = {
/10.2312/EGPGV/EGPGV12/139-148}
}
@inproceedings{
:10.2312/EGPGV/EGPGV12/129-138,
booktitle = {
Eurographics Symposium on Parallel Graphics and Visualization},
editor = {
Hank Childs and Torsten Kuhlen and Fabio Marton
}, title = {{
Shift-Based Parallel Image Compositing on InfiniBand TM Fat-Trees}},
author = {
Cavin, Xavier
and
Demengeon, Olivier
}, year = {
2012},
publisher = {
The Eurographics Association},
ISSN = {1727-348X},
ISBN = {978-3-905674-35-4},
DOI = {
/10.2312/EGPGV/EGPGV12/129-138}
}
@inproceedings{
:10.2312/EGPGV/EGPGV12/149-158,
booktitle = {
Eurographics Symposium on Parallel Graphics and Visualization},
editor = {
Hank Childs and Torsten Kuhlen and Fabio Marton
}, title = {{
Time-constrained Animation Rendering on Desktop Grids}},
author = {
Aggarwal, Vibhor
and
Debattista, Kurt
and
Bashford-Rogers, Thomas
and
Chalmers, Alan
}, year = {
2012},
publisher = {
The Eurographics Association},
ISSN = {1727-348X},
ISBN = {978-3-905674-35-4},
DOI = {
/10.2312/EGPGV/EGPGV12/149-158}
}

Browse

Recent Submissions

Now showing 1 - 16 of 16
  • Item
    HyperFlow: A Heterogeneous Dataflow Architecture
    (The Eurographics Association, 2012) Vo, Huy T.; Osmari, Daniel K.; Comba, João; Lindstrom, Peter; Silva, Cláudio T.; Hank Childs and Torsten Kuhlen and Fabio Marton
    We propose a dataflow architecture, called HyperFlow, that offers a supporting infrastructure that creates an abstraction layer over computation resources and naturally exposes heterogeneous computation to dataflow processing. In order to show the efficiency of our system as well as testing it, we have included a set of synthetic and real-case applications. First, we designed a general suite of micro-benchmarks that captures main parallel pipeline structures and allows evaluation of HyperFlow under different stress conditions. Finally, we demonstrate the potential of our system with relevant applications in visualization. Implementations in HyperFlow are shown to have greater performance than actual hand-tuning codes, yet still providing high scalability on different platforms.
  • Item
    EAVL: The Extreme-scale Analysis and Visualization Library
    (The Eurographics Association, 2012) Meredith, Jeremy S.; Ahern, Sean; Pugmire, Dave; Sisneros, Robert; Hank Childs and Torsten Kuhlen and Fabio Marton
    Analysis and visualization of the data generated by scientific simulation codes is a key step in enabling science from computation. However, a number of challenges lie along the current hardware and software paths to scientific discovery. First, only advanced parallelism techniques can take full advantage of the unprecedented scale of coming machines. In addition, as computational improvements outpace those of I/O, more data will be discarded and I/O-heavy analysis will suffer. Furthermore, the limited memory environment, particularly in the context of in situ analysis which can sidestep some I/O limitations, will require efficiency of both algorithms and infrastructure. Finally, advanced simulation codes with complex data models require commensurate data models in analysis tools. However, community visualization and analysis tools designed for parallelism and large data fall short in a number of these areas. In this paper, we describe EAVL, a new library with infrastructure and algorithms designed to address these critical needs for current and future generations of scientific software and hardware. We show results from EAVL demonstrating the strengths of its robust data model, advanced parallelism, and efficiency.
  • Item
    PISTON: A Portable Cross-Platform Framework for Data-Parallel Visualization Operators
    (The Eurographics Association, 2012) Lo, Li-ta; Sewell, Christopher; Ahrens, James; Hank Childs and Torsten Kuhlen and Fabio Marton
    Due to the wide variety of current and next-generation supercomputing architectures, the development of highperformance parallel visualization and analysis operators frequently requires re-writing the underlying algorithms for many different platforms. In order to facilitate portability, we have devised a framework for creating such operators that employs the data-parallel programming model. By writing the operators using only data-parallel primitives (such as scans, transforms, stream compactions, etc.), the same code may be compiled to multiple targets using architecture-specific backend implementations of these primitives. Specifically, we make use of and extend NVIDIA's Thrust library, which provides CUDA and OpenMP backends. Using this framework, we have implemented isosurface, cut surface, and threshold operators, and have achieved good parallel performance on two different architectures (multi-core CPUs and NVIDIA GPUs) using the exact same operator code. We have applied these operators to several large, real scientific data sets, and have open-source released a beta version of our code base.
  • Item
    A Study of Ray Tracing Large-scale Scientific Data in Two Widely Used Parallel Visualization Applications
    (The Eurographics Association, 2012) Brownlee, Carson; Patchett, John; Lo, Li-Ta; DeMarle, David; Mitchell, Christopher; Ahrens, James; Hansen, Charles D.; Hank Childs and Torsten Kuhlen and Fabio Marton
    Large-scale analysis and visualization is becoming increasingly important as supercomputers and their simulations produce larger and larger data. These large data sizes are pushing the limits of traditional rendering algorithms and tools thus motivating a study exploring these limits and their possible resolutions through alternative rendering algorithms . In order to better understand real-world performance with large data, this paper presents a detailed timing study on a large cluster with the widely used visualization tools ParaView and VisIt. The software ray tracer Manta was integrated into these programs in order to show that improved performance could be attained with software ray tracing on a distributed memory, GPU enabled, parallel visualization resource. Using the Texas Advanced Computing Center's Longhorn cluster which has multi-core CPUs and GPUs with large-scale polygonal data, we find multi-core CPU ray tracing to be significantly faster than both software rasterization and hardware-accelerated rasterization in existing scientific visualization tools with large data.
  • Item
    Explicit Cache Management for Volume Ray-Casting on Parallel Architectures
    (The Eurographics Association, 2012) Jönsson, Daniel; Ganestam, Per; Doggett, Michael; Ynnerman, Anders; Ropinski, Timo; Hank Childs and Torsten Kuhlen and Fabio Marton
    A major challenge when designing general purpose graphics hardware is to allow efficient access to texture data. Although different rendering paradigms vary with respect to their data access patterns, there is no flexibility when it comes to data caching provided by the graphics architecture. In this paper we focus on volume ray-casting, and show the benefits of algorithm-aware data caching. Our Marching Caches method exploits inter-ray coherence and thus utilizes the memory layout of the highly parallel processors by allowing them to share data through a cache which marches along with the ray front. By exploiting Marching Caches we can apply higher-order reconstruction and enhancement filters to generate more accurate and enriched renderings with an improved rendering performance. We have tested our Marching Caches with seven different filters, e. g., Catmul-Rom, Bspline, ambient occlusion projection, and could show that a speed up of four times can be achieved compared to using the caching implicitly provided by the graphics hardware, and that the memory bandwidth to global memory can be reduced by orders of magnitude. Throughout the paper, we will introduce the Marching Cache concept, provide implementation details and discuss the performance and memory bandwidth impact when using different filters.
  • Item
    GLuRay: Enhanced Ray Tracing in Existing Scientific Visualization Applications using OpenGL Interception
    (The Eurographics Association, 2012) Brownlee, Carson; Fogal, Thomas; Hansen, Charles D.; Hank Childs and Torsten Kuhlen and Fabio Marton
    Ray tracing in scientific visualization allows for substantial gains in performance and rendering quality with large scale polygonal datasets compared to brute-force rasterization, however implementing new rendering ar- chitectures into existing tools is often costly and time consuming. This paper presents a library, GLuRay, which intercepts OpenGL calls from many common visualization applications and renders them with the CPU ray tracer Manta without modification to the underlying visualization tool. Rendering polygonal models such as isosurfaces can be done identically to an OpenGL implementation using provided material and camera properties or superior rendering can be achieved using enhanced settings such as dielectric materials or pinhole cameras with depth of field effects. Comparative benchmarks were conducted on the Texas Advanced Computing Center's Longhorn cluster using the popular visualization packages ParaView, VisIt, Ensight, and VAPOR. Through the parallel ren- dering package ParaView, scaling up to 64 nodes is demonstrated. With our tests we show that using OpenGL interception to accelerate and enhance visualization programs provides a viable enhancement to existing tools with little overhead and no code modification while allowing for the creation of publication quality renderings us- ing advanced effects and greatly improved large-scale software rendering performance within tools that scientists are currently using.
  • Item
    Fast Collision Culling in Large-Scale Environments Using GPU Mapping Function
    (The Eurographics Association, 2012) Avril, Quentin; Gouranton, Valérie; Arnaldi, Bruno; Hank Childs and Torsten Kuhlen and Fabio Marton
    This paper presents a novel and efficient GPU-based parallel algorithm to cull non-colliding object pairs in very large-scale dynamic simulations. It allows to cull objects in less than 25ms with more than 100K objects. It is designed for many-core GPU and fully exploits multi-threaded capabilities and data-parallelism. In order to take advantage of the high number of cores, a new mapping function is defined that enables GPU threads to determine the objects pair to compute without any global memory access. These new optimized GPU kernel functions use the thread indexes and turn them into a unique pair of objects to test. A square root approximation technique is used based on Newton's estimation, enabling the threads to only perform a few atomic operations. A first characterization of the approximation errors is presented, enabling the fixing of incorrect computations. The I/O GPU streams are optimized using binary masks. The implementation and evaluation is made on largescale dynamic rigid body simulations. The increase in speed is highlighted over other recently proposed CPU and GPU-based techniques. The comparison shows that our system is, in most cases, faster than previous approaches.
  • Item
    Dynamic Scheduling for Large-Scale Distributed-Memory Ray Tracing
    (The Eurographics Association, 2012) Navrátil, Paul A.; Fussell, Donald S.; Lin, Calvin; Childs, Hank; Hank Childs and Torsten Kuhlen and Fabio Marton
    Ray tracing is an attractive technique for visualizing scientific data because it can produce high quality images that faithfully represent physically-based phenomena. Its embarrassingly parallel reputation makes it a natural candidate for visualizing large data sets on distributed memory clusters, especially for machines without specialized graphics hardware. Unfortunately, the traditional recursive ray tracing algorithm is exceptionally memory inefficient on large data, especially when using a shading model that generates incoherent secondary rays. As visualization moves through the petascale to the exascale, disk and memory efficiency will become increasingly important for performance, and traditional methods are inadequate. This paper presents a dynamic ray scheduling algorithm that effectively manages both ray state and data accesses. Our algorithm can render datasets that are larger than aggregate system memory, which existing statically scheduled ray tracers cannot render. For example, using 1024 cores of a supercomputing cluster, our unoptimized algorithm ray traces a 650GB dataset from an N-Body simulation with shadows and reflections, at about 1100 seconds per frame. For smaller problems that fit in aggregate memory, but are larger than typical shared memory, our algorithm is competitive with the best static scheduling algorithm.
  • Item
    Polygonization of Implicit Surfaces on Multi-Core Architectures with SIMD Instructions
    (The Eurographics Association, 2012) Shirazian, Pourya; Wyvill, Brian; Duprat, Jean-Luc; Hank Childs and Torsten Kuhlen and Fabio Marton
    In this research we tackle the problem of rendering complex models which are created using implicit primitives, blending operators, affine transformations and constructive solid geometry in a design environment that organizes all these in a scene graph data structure called BlobTree. We propose a fast, scalable, parallel polygonization algorithm for BlobTrees that takes advantage of multicore processors and SIMD optimization techniques available on modern architectures. Efficiency is achieved through the usage of spatial data structures and SIMD optimizations for BlobTree traversals and the computation of mesh vertices and other attributes. Our solution delivers interactive visualization for modeling systems based on BlobTree scene graph.
  • Item
    Light Propagation Maps on Parallel Graphics Architectures
    (The Eurographics Association, 2012) Gruson, Adrien; Patil, Ajit Hakke; Cozot, Remi; Bouatouch, Kadi; Pattanaik, Sumanta; Hank Childs and Torsten Kuhlen and Fabio Marton
    Light going through a participating medium like smoke can be scattered or absorbed by every point in the medium. To accurately render such a medium we must compute the radiance resulting at every point inside the medium because of these physical effects, which have been modeled by the radiative transfer equation. Computing the radiance at any point inside a participating medium amounts to numerically solving this radiative transport equation. Discrete Ordinate Method (DOM) is a widely used solution method. DOM is computationally intensive. Fattal [Fat09] proposed Light Propagation Maps (LPM) to expedite DOM computation. In this paper we propose a streaming based parallelization of LPM to run on SIMD graphics hardware. Our method is fast and scalable. We report more than 20x speed improvement by using our method as compared to Fattal's original method. Using our approach we are able to render 64x64x64 dynamic volumes with multiple scattering of light at interactive speed on complex lighting, and are able to render volumes of any size independent of the GPU memory capability.
  • Item
    Parallel Rendering on Hybrid Multi-GPU Clusters
    (The Eurographics Association, 2012) Eilemann, Stefan; Bilgili, Ahmet; Abdellah, Marwan; Hernando, Juan; Makhinya, Maxim; Pajarola, Renato; Schürmann, Felix; Hank Childs and Torsten Kuhlen and Fabio Marton
    Achieving efficient scalable parallel rendering for interactive visualization applications on medium-sized graphics clusters remains a challenging problem. Framerates of up to 60hz require a carefully designed and fine-tuned parallel rendering implementation that fits all required operations into the 16ms time budget available for each rendered frame. Furthermore, modern commodity hardware embraces more and more a NUMA architecture, where multiple processor sockets each have their locally attached memory and where auxiliary devices such as GPUs and network interfaces are directly attached to one of the processors. Such so called fat NUMA processing and graphics nodes are increasingly used to build cost-effective hybrid shared/distributed memory visualization clusters. In this paper we present a thorough analysis of the asynchronous parallelization of the rendering stages and we derive and implement important optimizations to achieve highly interactive framerates on such hybrid multi-GPU clusters. We use both a benchmark program and a real-world scientific application used to visualize, navigate and interact with simulations of cortical neuron circuit models.
  • Item
    Multi-GPU Image-based Visual Hull Rendering
    (The Eurographics Association, 2012) Hauswiesner, Stefan; Khlebnikov, Rostislav; Steinberger, Markus; Straka, Matthias; Reitmayr, Gerhard; Hank Childs and Torsten Kuhlen and Fabio Marton
    Many virtual mirror and telepresence applications require novel viewpoint synthesis with little latency to user motion. Image-based visual hull (IBVH) rendering is capable of rendering arbitrary views from segmented images without an explicit intermediate data representation, such as a mesh or a voxel grid. By computing depth images directly from the silhouette images, it usually outperforms indirect methods. GPU-hardware accelerated implementations exist, but due to the lack of an intermediate representation no multi-GPU parallel strategies and implementations are currently available. This paper suggests three ways to parallelize the IBVH-pipeline and maps them to the sorting classification that is often applied to conventional parallel rendering systems. In addition to sort-first parallelization, we suggest a novel sort-last formulation that regards cameras as scene objects. We enhance this method's performance by a block-based encoding of the rendering results. For interactive systems with hard real-time constraints, we combine the algorithm with a multi-frame rate (MFR) system. We suggest a combination of forward and backward image warping to improve the visual quality of the MFR rendering. We observed the runtime behavior of the suggested methods and assessed how their performance scales with respect to input and output resolutions and the number of GPUs. By using additional GPUs, we reduced rendering times by up to 60%. Multi-frame rate viewing can even be ten times faster.
  • Item
    Load-Balanced Multi-GPU Ambient Occlusion for Direct Volume Rendering
    (The Eurographics Association, 2012) Ancel, Alexandre; Dischler, Jean-Michel; Mongenet, Catherine; Hank Childs and Torsten Kuhlen and Fabio Marton
    Ambient occlusion techniques were introduced to improve data comprehension by bringing soft fading shadows to the visualization of 3D datasets. They consist in attenuating light by considering the occlusion resulting from the presence of neighboring structures. Nevertheless they often come with an important precomputation cost, which prevents their use in interactive applications based on transfer function editing. This paper explores parallel solutions to reach interactive framerates with the use of a multi-GPU setup. Our method distributes the data to the different devices for computation. We use bricking and load balancing to optimize computation time. We also introduce two repartition schemes: a static one, which divides the dataset into as many blocks as there are GPUs and a dynamic one, which divides the dataset into smaller blocks and distributes them using a producerconsumer way. Results, using an 8-GPU architecture, show that we manage to get important speedups compared to a mono-GPU setup.
  • Item
    Auto Splats: Dynamic Point Cloud Visualization on the GPU
    (The Eurographics Association, 2012) Preiner, Reinhold; Jeschke, Stefan; Wimmer, Michael; Hank Childs and Torsten Kuhlen and Fabio Marton
    Capturing real-world objects with laser-scanning technology has become an everyday task. Recently, the acquisition of dynamic scenes at interactive frame rates has become feasible. A high-quality visualization of the resulting point cloud stream would require a per-frame reconstruction of object surfaces. Unfortunately, reconstruction computations are still too time-consuming to be applied interactively. In this paper we present a local surface reconstruction and visualization technique that provides interactive feedback for reasonably sized point clouds, while achieving high image quality. Our method is performed entirely on the GPU and in screen space, exploiting the efficiency of the common rasterization pipeline. The approach is very general, as no assumption is made about point connectivity or sampling density. This naturally allows combining the outputs of multiple scanners in a single visualization, which is useful for many virtual and augmented reality applications.
  • Item
    Shift-Based Parallel Image Compositing on InfiniBand TM Fat-Trees
    (The Eurographics Association, 2012) Cavin, Xavier; Demengeon, Olivier; Hank Childs and Torsten Kuhlen and Fabio Marton
    Parallel image compositing has been widely studied over the past 20 years, as this is one, if not the most, crucial element in the implementation of a scalable parallel rendering system. Many algorithms have been proposed and implemented on a large variety of supercomputers. Among the existing supercomputers, InfiniBandTM (IB) PC clusters, and their associated fat-tree topology, are clearly becoming the dominant architecture, as they provide the scalability, high bandwidth and low latency required by the most demanding parallel applications. Surprisingly, very few efforts have been devoted to the implementation and performance evaluation of parallel image compositing algorithms on this kind of architecture. We propose in this paper a new parallel image compositing algorithm, called Shift-Based, relying on a well-known communication pattern called shift permutation. Indeed, shift permutation is one of the possible ways to get the maximum cross bisectional bandwidth provided by an IB fat-tree cluster. We show that our Shift-Based algorithm scales on any number of processing nodes (with peak performance on specific counts), allows overlapping communications with computations and exhibits contentionfree network communications. This is demonstrated with the image compositing of very high resolution images at interactive frame rates.
  • Item
    Time-constrained Animation Rendering on Desktop Grids
    (The Eurographics Association, 2012) Aggarwal, Vibhor; Debattista, Kurt; Bashford-Rogers, Thomas; Chalmers, Alan; Hank Childs and Torsten Kuhlen and Fabio Marton
    The computationally intensive nature of high-fidelity rendering has led to a dependence on parallel infrastructures for generating animations. However, such an infrastructure is expensive thereby restricting easy access to highfidelity animations to organisations which can afford such resources. A desktop grid formed by aggregating idle resources in an institution is an inexpensive alternative, but it is inherently unreliable due to the non-dedicated nature of the architecture. A naive approach to employing desktop grids for rendering animations could lead to potential inconsistencies in the quality of the rendered animation as the available computational performance fluctuates. Hence, fault-tolerant algorithms are required for efficiently utilising a desktop grid. This paper presents a novel fault-tolerant rendering algorithm for generating high-fidelity animations in a user-defined time-constraint. Time-constrained computation provides an elegant way of harnessing desktop grids as otherwise makespan cannot be guaranteed. The algorithm uses multi-dimensional quasi-random sampling for load balancing, aimed at achieving the best visual quality across the whole animation even in the presence of faults. The results show that the presented algorithm is largely insensitive to temporal variations in computational power of a desktop grid, making it suitable for employing in deadline-driven production environments.