EGPGV09: Eurographics Symposium on Parallel Graphics and Visualization
Permanent URI for this collection
Browse
Browsing EGPGV09: Eurographics Symposium on Parallel Graphics and Visualization by Title
Now showing 1 - 14 of 14
Results Per Page
Sort Options
Item Data-Parallel Hierarchical Link Creation for Radiosity(The Eurographics Association, 2009) Meyer, Quirin; Eisenacher, Christian; Stamminger, Marc; Dachsbacher, Carsten; Kurt Debattista and Daniel Weiskopf and Joao CombaThe efficient simulation of mutual light exchange for radiosity-like methods has been demonstrated on GPUs. However, those approaches require a suitable set of links and hierarchical data structures, prepared in an expensive preprocessing step. We present a fast, data-parallel method to create links and a compact tree of patches. We demonstrate our approach for Antiradiance and Implicit Visibility. Our algorithm is able to create up to 50 M links per second on an Nvidia GTX 260, allowing fully dynamic scenes at interactive frame rates.Item A Decomposition Approach for Optimizing Large-Scale Parallel Image Composition on Multi-Core MPP Systems(The Eurographics Association, 2009) Nonaka, Jorji; Ono, Kenji; Kurt Debattista and Daniel Weiskopf and Joao CombaIn recent years, multi-core processor architecture has emerged as the predominant hardware architecture for high performance computing (HPC) systems. In addition, computational nodes based on SMP (symmetric multiprocessing) and NUMA (non-uniform memory architecture) have become increasingly common. Traditional parallel image composition algorithms were not primarily designed to take advantage of the combined message passing and shared address space parallelism provided by modern massively parallel processing (MPP) systems. This therefore might result in undesirable performance loss. In this study, we have investigated the use of a simple decomposition approach to take advantage of these different hardware characteristics for optimizing the parallel image composition process. Performance evaluation was carried out on a multi-core, multi-processor architecture based T2K Open Supercomputer, and we obtained encouraging results showing the effectiveness of the proposed approach. This approach also seems promising to tackle the large-scale image composition problem on nextgeneration HPC systems where an ever increasing number of processing cores are expected.Item Distributed Visualization of Complex Black Oil Reservoir Models(The Eurographics Association, 2009) Abraham, Frederico; Celes, Waldemar; Kurt Debattista and Daniel Weiskopf and Joao CombaRecent accomplishments in the computer simulation of black oil reservoirs have created a demand for the visualization of very large models. In this paper, we present a distributed system for the rendering of such models. Following recent trends in the high performance computing area, the system is intended to make the visualization of these models available to lightweight clients on corporate networks, through the use of a cluster of inexpensive off-the-shelf PCs equipped with multiple GPUs. The proposed system uses a sort-last approach and supports a diverse set of visualization techniques. Through an efficient use of each GPU and a partial composition stage on each cluster node, our solution tackles the scalability issues that arise when using mid-to-large GPU clusters. Experimental results show that our implementation can sustain the visualization of models with up to 60 million cells at interactive rates, using a cluster with 16 nodes, each one equipped with 4 GPUs. Experimental results also demonstrate the scalability of the proposed solution.Item Dynamic Grid Refinement for Fluid Simulations on Parallel Graphics Architectures(The Eurographics Association, 2009) Ament, Marco; Straßer, Wolfgang; Kurt Debattista and Daniel Weiskopf and Joao CombaWe present a physically-based fluid simulation with dynamic grid refinement on parallel SIMD graphics hardware. The irregular and dynamic structure of an adaptive grid requires sophisticated memory access patterns as well as a decomposition of the problem for parallel processing and the distribution of tasks to multiple threads. In this paper, we focus on the representation and management of the dynamic grid on the graphics device for an efficient parallelization of the advection step and the iterative solving of the Poisson equation. In order to achieve high performance, we utilize the hardware's capabilities like fast cache access and trilinear filtering. Furthermore, expensive data transfer between host and device is minimized to avoid a major bottleneck. We report results on the inherent overhead of the dynamic grid compared to an equivalent Cartesian grid. In addition, a visual simulation of smoke is presented with radiosity-based illumination and volume ray casting at interactive frame rates.Item Fast Parallel Unbiased Diffeomorphic Atlas Construction on Multi-Graphics Processing Units(The Eurographics Association, 2009) Ha, Linh K.; Krüger, Jens; Fletcher, P. Thomas; Joshi, Sarang; Silva, Claudio T.; Kurt Debattista and Daniel Weiskopf and Joao CombaUnbiased diffeomorphic atlas construction has proven to be a powerful technique for medical image analysis, particularly in brain imaging. The method operates on a large set of images, mapping them all into a common coordinate system, and creating an unbiased common template for studying intra-population variability and interpopulation differences. The technique has also proven effective in tissue and object segmentation via registration of anatomical labels. However, a major barrier to the use of this approach is its high computational cost. Especially with the increasing number of inputs and data size, it becomes impractical even with a fully optimized implementation on CPUs. Fortunately, the highly element-wise independence of the problem makes it well suited for parallel processing. This paper presents an efficient implementation of unbiased diffeomorphic atlas construction on the new parallel processing architecture based on Multi-Graphics Processing Units (Multi-GPUs). Our results show that the GPU implementation gives a substantial performance gain on the order of twenty to sixty times faster than a single CPU and provides an inexpensive alternative to large distributed-memory CPU clusters.Item A Flexible Adaptation Service for Distributed Rendering(The Eurographics Association, 2009) Repplinger, Michael; Löffler, Alexander; Thielen, Martin; Slusallek, Philipp; Kurt Debattista and Daniel Weiskopf and Joao CombaEven though high-performance real-time rendering showed significant improvements through implementing its algorithms on top of many-core technologies, achieving interactivity in large scenes still requires a networked cluster for distributing the workload. Available frameworks assume a high-bandwidth networking between nodes of a cluster and ignore remote rendering scenarios where adaptation to limited resources (e.g., low bandwidth) is required. In this paper, we present an extension to the flexible URay framework for distributed rendering that allows to react to unfavorable and changing network conditions.We show how adaptation strategies are applied to streams of rendered images, and how to realize application scenarios that are even able to use the Internet as a communication network, which suffers from unpredictable conditions in terms of latency and bandwidth.Item Hybrid Parallelization for Multi-View Visualization of Time-Dependent Simulation Data(The Eurographics Association, 2009) Hentschel, Bernd; Wolter, Marc; Renze, Peter; Schröder, Wolfgang; Bischof, Christian; Kuhlen, Torsten; Kurt Debattista and Daniel Weiskopf and Joao CombaInteractive analysis using multiple linked views has been successfully applied to time-dependent simulation data. In this paper we extend previous work by embedding multiple views in a virtual environment. Here, we combine 3D scatterplots with direct interaction and natural stereoscopic viewing. In order to deal with today's simulation data effectively, we propose a hybrid parallelization scheme based on distributing the workload between a powerful compute back-end and a rendering client. It minimizes the amount of latency introduced by the distributed setup, which is vital in order to facilitate highly interactive operations such as brushing. We illustrate the effectiveness of our approach in a case study from the field of flow visualization.Item Interactive Physical Simulation on Multicore Architectures(The Eurographics Association, 2009) Hermann, Everton; Raffin, Bruno; Faure, Francois; Kurt Debattista and Daniel Weiskopf and Joao CombaIn this paper we propose a parallelization of interactive physical simulations. Our approach relies on a task parallelism where the code is instrumented to mark tasks and shared data between tasks, as well as parallel loops even if they have dynamic conditions. Prior to running a simulation step, we extract a task dependency graph that is partitioned to define the task distribution between processors. To limit the overhead of graph partitioning and favor memory locality, we intend to limit the partitioning changes from one iteration to the other. This approach has a low impact on physics algorithms as parallelism is mainly extracted from the coordination code. It makes it non parallel programmer friendly. Results show we can obtain good performance gains.Item Parallel Mesh Clustering(The Eurographics Association, 2009) Chiosa, Iurie; Kolb, Andreas; Cuntz, Nicolas; Lindner, Marvin; Kurt Debattista and Daniel Weiskopf and Joao CombaFast and qualitative clustering of large polygonal surface meshes still remains one of the most demanding fields in mesh processing. Because existing clustering algorithms are very time-consuming, the use of parallel hardware, i.e. the graphics processing unit (GPU), is a reasonable and crucial task in this domain. However, due to the sequential nature of most of these algorithms this is hard to be achieved. In this paper we address the parallel reformulation of the existing approaches and show a suitable GPU implementation for variational or hierarchical parallel mesh clustering. A boundary-based mesh clustering framework is proposed as a new clustering concept which provides all necessary ingredients for parallel mesh clustering. Here we focus on a specific subtype of the variational clustering algorithm which does not restrict the applicability of the approach as such but reveals much better performance characteristics. A parallel multilevel (ML) mesh clustering, for which several dual edges are collapsed in each step, is proposed as an option to the classical ML clustering, where only one dual edge collapse is applied in each step. We show how these algorithms can be entirely implemented (giving some non-trivial GPU-specific solutions) and accelerated on GPU. We demonstrate both approaches applying them to Centroidal Voronoi Diagram (CVD) based clustering. For boundary-based mesh clustering we achieved speed up factors of 10 to 18.Item Parallel Solution to the Radiative Transport(The Eurographics Association, 2009) Szirmay-Kalos, Laszló; Liktor, Gabor; Umenhoffer, Tamas; Tóth, Balazs; Kumar, Shree; Lupton, Glenn; Kurt Debattista and Daniel Weiskopf and Joao CombaThis paper presents a fast parallel method to compute the solution of the radiative transport equation in inhomogeneous participating media. The efficiency of the method comes from different factors. First, we use a novel approximation scheme to find a good guess for both the direct and the scattered component. This scheme is based on the analytic solution for homogeneous media, which is modulated by the local material properties. Then, the initial approximation is refined iteratively. The iterative refinement is executed on a face centered cubic grid, which is decomposed to blocks according to the available simulation nodes. The implementation uses CUDA and runs on a cluster of GPUs. We also show how the communication bottleneck can be avoided by not exchanging the boundary conditions in every iteration step.Item Parallelized Matrix Factorization for fast BTF Compression(The Eurographics Association, 2009) Ruiters, Roland; Rump, Martin; Klein, Reinhard; Kurt Debattista and Daniel Weiskopf and Joao CombaDimensionality reduction methods like Principal Component Analysis (PCA) have become commonplace for the compression of large datasets in computer graphics. One important application is the compression of Bidirectional Texture Functions (BTF). However, the use of such techniques has still many limitations that arise from the large size of the input data which results in impractically high compression times. In this paper, we address these shortcomings and present a method which allows for efficient parallelized computation of the PCA of a large BTF matrix. The matrix is first split into several blocks for which the PCA can be performed independently and thus in parallel. We scale the single subproblems in such a way, that they can be solved in-core using the EM-PCA algorithm. This allows us to perform the calculation on current GPUs exploiting their massive parallel computing power. The eigenspaces determined for the individual blocks are then merged to obtain the PCA of the whole dataset. This way nearly arbitrarily sized matrices can be processed considerably faster than by serial algorithms. Thus, BTFs with much higher spatial and angular resolution can be compressed in reasonable time.Item Simulation of Radio Wave Propagation by Beam Tracing(The Eurographics Association, 2009) Schmitz, Arne; Rick, Tobias; Karolski, Thomas; Kuhlen, Thorsten; Kobbelt, Leif; Kurt Debattista and Daniel Weiskopf and Joao CombaBeam tracing can be used for solving global illumination problems. It is an efficient algorithm, and performs very well when implemented on the GPU. This allows us to apply the algorithm in a novel way to the problem of radio wave propagation. The simulation of radio waves is conceptually analogous to the problem of light transport. However, their wavelengths are of proportions similar to that of the environment. At such frequencies, waves that bend around corners due to diffraction are becoming an important propagation effect. In this paper we present a method which integrates diffraction, on top of the usual effects related to global illumination like reflection, into our beam tracing algorithm. We use a custom, parallel rasterization pipeline for creation and evaluation of the beams. Our algorithm can provide a detailed description of complex radio channel characteristics like propagation losses and the spread of arriving signals over time (delay spread). Those are essential for the planning of communication systems required by mobile network operators. For validation, we compare our simulation results with measurements from a real world network.Item Time-constrained High-fidelity Rendering on Local Desktop Grids(The Eurographics Association, 2009) Aggarwal, Vibhor; Debattista, Kurt; Dubla, Piotr; Bashford-Rogers, Thomas; Chalmers, Alan; Kurt Debattista and Daniel Weiskopf and Joao CombaParallel computing has been frequently used for reducing the rendering time of high-fidelity images, since the generation of such images has a high computational cost. Numerous algorithms have been proposed for parallel rendering but they primarily focus on utilising shared memory machines or dedicated distributed clusters. A local desktop grid, composed of arbitrary computational resources connected to a network such as those in a lab or an enterprise, provides an inexpensive alternative to dedicated clusters. The computational power offered by such a desktop grid is time-variant as the resources are not dedicated. This paper presents fault-tolerant algorithms for rendering high-fidelity images on a desktop grid within a given time-constraint. Due to the dynamic nature of resources, the task assignment does not rely on subdividing the image into tiles. Instead, a progressive approach is used that encompasses aspects of the entire image for each task and ensures that the time-constraints are met. Traditional reconstruction techniques are used to calculate the missing data. This approach is designed to avoid redundancy to maintain time-constraints. As a further enhancement, the algorithm decomposes the computation into components representing different tasks to achieve better visual quality considering the time-constraint and variable resources. This paper illustrates how the component-based approach maintains a better visual fidelity considering a given time-constraint while making use of volatile computational resources.Item Wait-Free Shared-Memory Irradiance Cache(The Eurographics Association, 2009) Dubla, Piotr; Debattista, Kurt; Santos, Luis Paulo; Chalmers, Alan; Kurt Debattista and Daniel Weiskopf and Joao CombaThe irradiance cache (IC) is an acceleration data structure which caches indirect diffuse irradiance values within the context of a ray tracing algorithm. In multi-threaded shared memory parallel systems the IC must be shared among rendering threads in order to achieve high efficiency levels. Since all threads read and write from it an access control mechanism is required, which ensures that the data structure is not corrupted. Besides assuring correct accesses to the IC this access mechanism must incur minimal overheads such that performance is not compromised. In this paper we propose a new wait-free access mechanism to the shared irradiance cache. Wait-free data struc- tures, unlike traditional access control mechanisms, do not make use of any blocking or busy waiting, avoiding most serialisation and reducing contention. We compare this technique with two other classical approaches: a lock based mechanism and a local write technique, where each thread maintains its own cache of locally evaluated irradiance values. We demonstrate that the wait-free approach significantly reduces synchronisation overheads compared to the two other approaches and that it increases data sharing over the local copy technique. This is, to the extent of our knowledge, the first work explicitly addressing access to a shared IC; this problem is becoming more and more relevant with the advent of multicore systems and the ever increasing number of processors within these systems.