EGPGV: Eurographics Workshop on Parallel Graphics and Visualization

Permanent URI for this community

https://diglib.eg.org/handle/10.2312/353

Browse

Now showing 1 - 20 of 253

An Accelerated Clip Algorithm for Unstructured Meshes: A Batch-Driven Approach
(The Eurographics Association, 2024) Tsalikis, Spiros; Schroeder, Will; Szafir, Daniel; Moreland, Kenneth; Reina, Guido; Rizzi, Silvio
The clip technique is a popular method for visualizing complex structures and phenomena within 3D unstructured meshes. Meshes can be clipped by specifying a scalar isovalue to produce an output unstructured mesh with its external surface as the isovalue. Similar to isocontouring, the clipping process relies on scalar data associated with the mesh points, including scalar data generated by implicit functions such as planes, boxes, and spheres, which facilitates the visualization of results interior to the grid. In this paper, we introduce a novel batch-driven parallel algorithm based on a sequential clip algorithm designed for high-quality results in partial volume extraction. Our algorithm comprises five passes, each progressively processing data to generate the resulting clipped unstructured mesh. The novelty lies in the use of fixed-size batches of points and cells, which enable rapid workload trimming and parallel processing, leading to a significantly improved memory footprint and run-time performance compared to the original version. On a 32-core CPU, the proposed batch-driven parallel algorithm demonstrates a run-time speed-up of up to 32.6x and a memory footprint reduction of up to 4.37x compared to the existing sequential algorithm. The software is currently available under an open-source license in the VTK visualization system.
Accelerated Volume Rendering with Homogeneous Region Encoding using Extended Anisotropic Chessboard Distance on GPU
(The Eurographics Association, 2006) Es, A.; Keles, H. Y.; Isler, V.; Alan Heirich and Bruno Raffin and Luis Paulo dos Santos
Ray traversal is the most time consuming part in volume ray casting. In this paper, an acceleration technique for direct volume rendering is introduced, which uses a GPU friendly data structure to reduce traversal time. Empty regions and homogeneous regions in the volume is encoded using extended anisotropic chessboard distance (EACD) transformation. By means of EACD encoding, both the empty spaces and samples belonging to the homogeneous regions are processed efficiently on GPU with minimum branching. In addition to skipping empty spaces, this method reduces the sampling operation inside a homegeneous region using ray integral factorization. The proposed algorithm integrates the optical properties in the homogeneous regions in one step and leaps directly to the next region. We show that our method can work more than 6 times faster than primitive ray caster without any visible loss in image quality.
Accelerating and Benchmarking Radix-k Image Compositing at Large Scale
(The Eurographics Association, 2010) Kendall, Wesley; Peterka, Tom; Huang, Jian; Shen, Han-Wei; Ross, Robert; James Ahrens and Kurt Debattista and Renato Pajarola
Radix-k was introduced in 2009 as a configurable image compositing algorithm. The ability to tune it by selecting k-values allows it to benefit more from pixel reduction and compression optimizations than its predecessors. This paper describes such optimizations in Radix-k, analyzes their effects, and demonstrates improved performance and scalability. In addition to bounding and run-length encoding pixels, k-value selection and load balance are regulated at run-time. Performance is systematically analyzed for an array of process counts, image sizes, and HPC and graphics clusters. Analyses are performed using compositing of synthetic images and also in the context of a complete volume renderer and scientific data. We demonstrate increased performance over binary swap and show that 64 megapixels can be composited at rates of 0.08 seconds, or 12.5 frames per second, at 32 K processes.
Accelerating the Irradiance Cache through Parallel Component-Based Rendering
(The Eurographics Association, 2006) Debattista, Kurt; Santos, Luís Paulo; Chalmers, Alan; Alan Heirich and Bruno Raffin and Luis Paulo dos Santos
The irradiance cache is an acceleration data structure which caches indirect diffuse samples within the framework of a distributed ray-tracing algorithm. Previously calculated values can be stored and reused in future calculations, resulting in an order of magnitude improvement in computational performance. However, the irradiance cache is a shared data structure and so it is notoriously difficult to parallelise over a distributed parallel system. The hurdle to overcome is when and how to share cached samples. This sharing incurs communication overheads and yet must happen frequently to minimise cache misses and thus maximise the performance of the cache. We present a novel component-based parallel algorithm implemented on a cluster of computers, whereby the indirect diffuse calculations are calculated on a subset of nodes in the cluster. This method exploits the inherent spatial coherent nature of the irradiance cache; by reducing the set of nodes amongst which cached values must be shared, the sharing frequency can be kept high, thus decreasing both communication overheads and cache misses. We demonstrate how our new parallel rendering algorithm significantly outperforms traditional methods of distributing the irradiance cache.
Acceleration of Opacity Correction Mechanisms for Over-sampled Volume Ray Casting
(The Eurographics Association, 2008) Lee, Jong Kwan; Newman, Timothy S.; Jean M. Favre and Kwan-Liu Ma
Techniques for accelerated opacity correction for over-sampled volume ray casting on commodity hardware are described. The techniques exploit processing capabilities of programmable GPUs and cluster computers. The GPU-based technique follows a fine-grained parallel approach that exposes to the GPU the inherent parallelism in the opacity correction process. The cluster computation techniques follow less finely-granular data parallel approaches that allow exploitation of computational resources with minimal inter-CPU communication. The performance improvements offered by the accelerated approaches over opacity correction on a single CPU are also exhibited for real volumetric datasets.
Achieving Portable Performance For Wavelet Compression Using Data Parallel Primitives
(The Eurographics Association, 2017) Li, Shaomeng; Marsaglia, Nicole; Chen, Vincent; Sewell, Christopher; Clyne, John; Childs, Hank; Alexandru Telea and Janine Bennett
We consider the problem of wavelet compression in the context of portable performance over multiple architectures. We contribute a new implementation of the wavelet transform algorithm that uses data parallel primitives from the VTK-m library. Because of the data parallel primitives approach, our algorithm is hardware-agnostic and yet can run on many-core architectures. We also study the efficacy of this implementation over multiple architectures against hardware-specific comparators. Results show that our performance is portable, scales well, and is comparable to native implementations. Finally, we argue that compression times for large data sets are likely fast enough to fit within in situ constraints, adding to the evidence that wavelet transformation could be an effective in situ compression operator.
Adaptive Collision Culling for Large-Scale Simulations by a Parallel Sweep and Prune Algorithm
(The Eurographics Association, 2016) Capannini, Gabriele; Larsson, Thomas; Enrico Gobbetti and Wes Bethel
We propose a parallel Sweep and Prune algorithm that solves the dynamic box intersection problem in three dimensions. It scales up to very large datasets, which makes it suitable for broad phase collision detection in complex moving body simulations. Our algorithm gracefully handles high-density scenarios, including challenging clustering behavior, by using a dual-axis sweeping approach and a cache-friendly succinct data structure. The algorithm is realized by three parallel stages for sorting, candidate generation, and object pairing. By the use of temporal coherence, our sorting stage runs with close to optimal load balancing. Furthermore, our approach is characterized by a work-division strategy that relies on adaptive partitioning, which leads to almost ideal scalability. Experimental results show high performance for up to millions of objects on modern multi-core CPUs.
Alternative Parameters for On-The-Fly Simplification of MergeTrees
(The Eurographics Association, 2020) Werner, Kilian; Garth, Christoph; Frey, Steffen and Huang, Jian and Sadlo, Filip
Topological simplification of merge trees requires a user specified persistence threshold. As this threshold is based on prior domain knowledge and has an unpredictable relation to output size, its use faces challenges in large-data situations like online, distributed or out-of-core scenarios. We propose two alternative parameters, a targeted percentile size reduction and a total output size limit, to increase flexibility in those scenarios.
Analysis of Cache Behavior and Performance of Different BVH Memory Layouts for Tracing Incoherent Rays
(The Eurographics Association, 2013) Wodniok, Dominik; Schulz, Andre; Widmer, Sven; Goesele, Michael; Fabio Marton and Kenneth Moreland
With CPUs moving towards many-core architectures and GPUs becoming more general purpose architectures, path tracing can now be well parallelized on commodity hardware. While parallelization is trivial in theory, properties of real hardware make efficient parallelization difficult, especially when tracing incoherent rays. We investigate how different bounding volume hierarchy (BVH) and node memory layouts as well as storing the BVH in different memory areas impacts the ray tracing performance of a GPU path tracer. We optimize the BVH layout using information gathered in a pre-processing pass applying a number of different BVH reordering techniques. Depending on the memory area and scene complexity, we achieve moderate speedups.
An Application of Scalable Massive Model Interaction using Shared-Memory Systems
(The Eurographics Association, 2006) Stephens, Abe; Boulos, Solomon; Bigler, James; Wald, Ingo; Parker, Steven; Alan Heirich and Bruno Raffin and Luis Paulo dos Santos
During the end-to-end digital design of a commerical airliner, a massive amount of geometric data is produced. This data can be used for inspection or maintenance throughout the life of the aircraft. Massive model interactive ray tracing can provide maintenance personnel with the capability to easily visualize the entire aircraft at once. This paper describes the design of the renderer used to demonstrate the feasibility of integrating interactive ray tracing in a commerical aircraft inspection and maintenance scenario. We describe the feasibility demonstration involving actual personnel performing real-world tasks and the scalable architecture of the parallel shared memory renderer.
Approach for software development of parallel real-time VE systems on heterogenous clusters
(The Eurographics Association, 2002) Winkelholz, C.; Alexander, T.; D. Bartz and X. Pueyo and E. Reinhard
This paper presents our approach for the development of software for parallel real-time virtual environment systems (VE) running on heterogenous clusters of computers. This approach is based on a framework we have developed to facilitate the set-up of immersive virtual environment systems using single components coupled by an isolated local network. The framework provides parallel rendering of multiple projection screens and parallel execution of application and interaction tasks on components spread across a cluster. Main concept of the approach discussed in this paper is to use the virtual reality modeling language (VRML) as an interface definition language (IDL) for the parallel and distributed virtual environment system. An IDL-compiler generates skeleton-code for the implementations of the script nodes specified in a VRML-file. Components created this way can be reused in any VE by declaring the same interfaces. Instances of the implemented interfaces can reside in any application. By this approach commercial-of-the-shelf software can easily be integrated into a VE application. In this connection we discuss the underlying framework and software development process. Furthermore, the implementation of a VE system for a geographic information system (GIS) based on this approach is shown. It is emphasized that the components are used in various different applications.
Approaches for In Situ Computation of Moments in a Data-Parallel Environment
(The Eurographics Association, 2020) Tsai, Karen C.; Bujack, Roxana; Geveci, Berk; Ayachit, Utkarsh; Ahrens, James; Frey, Steffen and Huang, Jian and Sadlo, Filip
Feature-driven in situ data reduction can overcome the I/O bottleneck that large simulations face in modern supercomputer architectures in a semantically meaningful way. In this work, we make use of pattern detection as a black box detector of arbitrary feature templates of interest. In particular, we use moment invariants because they allow pattern detection independent of the specific orientation of a feature. We provide two open source implementations of a rotation invariant pattern detection algorithm for high performance computing (HPC) clusters with a distributed memory environment. The first one is a straightforward integration approach. The second one makes use of the Fourier transform and the Cross-Correlation Theorem. In this paper, we will compare the two approaches with respect to performance and flexibility and showcase results of the in situ integration with real world simulation code.
Asynchronous BVH Construction for Ray Tracing Dynamic Scenes on Parallel Multi-Core Architectures
(The Eurographics Association, 2007) Ize, Thiago; Wald, Ingo; Parker, Steven G.; Jean M. Favre and Luis Paulo Santos and Dirk Reiners
Recent developments have produced several techniques for interactive ray tracing of dynamic scenes. In particular, bounding volume hierarchies (BVHs) are efficient acceleration structures that handle complex triangle distributions and can accommodate deformable scenes by updating (refitting) the bounding primitive without restructuring the entire tree. Unfortunately, updating only the bounding primitive can result in a degradation of the quality of the BVH, and in some scenes will result in a dramatic deterioration of rendering performance. The typical method to avoid this degradation is to rebuild the BVH when a heuristic determines the tree is no longer efficient, but this rebuild results in a disruption of interactive system response. We present a method that removes this gradual decline in performance while enabling consistently fast BVH performance. We accomplish this by asynchronously rebuilding the BVH concurrently with rendering and animation, allowing the BVH to be restructured within a handful of frames.
Asynchronous Parallel Reliefboard Computation for Scene Object Approximation
(The Eurographics Association, 2010) Süß, Tim; Jähn, Claudius; Fischer, Matthias; James Ahrens and Kurt Debattista and Renato Pajarola
We present a parallel algorithm for the rendering of complex three-dimensional scenes. The algorithm runs across heterogeneous architectures of PC-clusters consisting of a visualization-node, equipped with a powerful graphics adapter, and cluster nodes requiring weaker graphics capabilities only. The visualization-node renders a mixture of scene objects and simplified meshes (Reliefboards). The cluster nodes assist the visualization-node by asynchronous computing of Reliefboards, which are used to replace and render distant parts of the scene. Our algorithm is capable of gaining significant speedups if the cluster's nodes provide weak graphics adapters only. We trade the number of cluster nodes off the scene objects' image quality.
Auto Splats: Dynamic Point Cloud Visualization on the GPU
(The Eurographics Association, 2012) Preiner, Reinhold; Jeschke, Stefan; Wimmer, Michael; Hank Childs and Torsten Kuhlen and Fabio Marton
Capturing real-world objects with laser-scanning technology has become an everyday task. Recently, the acquisition of dynamic scenes at interactive frame rates has become feasible. A high-quality visualization of the resulting point cloud stream would require a per-frame reconstruction of object surfaces. Unfortunately, reconstruction computations are still too time-consuming to be applied interactively. In this paper we present a local surface reconstruction and visualization technique that provides interactive feedback for reasonably sized point clouds, while achieving high image quality. Our method is performed entirely on the GPU and in screen space, exploiting the efficiency of the common rasterization pipeline. The approach is very general, as no assumption is made about point connectivity or sampling density. This naturally allows combining the outputs of multiple scanners in a single visualization, which is useful for many virtual and augmented reality applications.
Auto-Tuning Complex Array Layouts for GPUs
(The Eurographics Association, 2014) Weber, Nicolas; Goesele, Michael; Margarita Amor and Markus Hadwiger
The continuing evolution of Graphics Processing Units (GPU) has shown rapid performance increases over the years. But with each new hardware generation, the constraints for programming them efficiently have changed. Programs have to be tuned towards one specific hardware to unleash the full potential. This is time consuming and costly as vendors tend to release a new generation every 18 months. It is therefore important to auto-tune GPU code to achieve GPU-specific improvements. Using either static or empirical profiling to adjust parameters or to change the kernel implementation. We introduce a new approach to automatically improve memory access on GPUs. Our system generates an application specific library which abstracts the memory access for complex arrays on the host and GPU side. This allows to optimize the code by exchanging the memory layout without recompiling the application, as all necessary layouts are pre-compiled into the library. Our implementation is able to speedup real-world applications up to an order of magnitude and even outperforms hand-tuned implementations.
Automatic In Situ Camera Placement for Isosurfaces of Large-Scale Scientific Simulations
(The Eurographics Association, 2022) Marsaglia, Nicole; Mathai, Manish; Fields, Stefan; Childs, Hank; Bujack, Roxana; Tierny, Julien; Sadlo, Filip
High-performance computing trends are requiring in situ processing increasingly often. This work considers automating camera placement for in situ visualization, specifically of isosurfaces, which is needed when there is no human in the loop and no a priori knowledge of where to place the camera. Our approach utilizes Viewpoint Quality (VQ) metrics, which quantify which camera positions provide the most insight. We have two primary contributions. First, we introduce an approach parallelizing the calculation of VQ metrics, which is necessary for usage in an in situ setting. Second, we introduce an algorithm for searching for a good camera position that balances between maximizing VQ metric score and minimizing execution time. We evaluate our contributions with an in situ performance study on a supercomputer. Our findings confirm that our approach is viable, and in particular that we can find good viewpoints with small execution time.
Cache-Efficient Parallel Isosurface Extraction for Shared Cache Multicores
(The Eurographics Association, 2010) Tchiboukdjian, Marc; Danjean, Vincent; Raffin, Bruno; James Ahrens and Kurt Debattista and Renato Pajarola
This paper proposes to revisit isosurface extraction algorithms taking into consideration two specific aspects of recent multicore architectures: their intrinsic parallelism associated with the presence of multiple computing cores and their cache hierarchy that often includes private caches as well as caches shared between all cores. Taking advantage of these shared caches require adapting the parallelization scheme to make the core collaborate on cache usage and not compete for it, which can impair performance. We propose to have cores working on independent but close data sets that can all fit in the shared cache. We propose two shared cache aware parallel isosurface algorithms, one based on marching tetrahedra, and one using a min-max tree as acceleration data structure. We theoretically prove that in both cases the number of cache misses is the same as for the sequential algorithm for the same cache size. The algorithms are based on the FastCOL cache-oblivious data layout for irregular meshes. The CO layout also enables to build a very compact min-max tree that leads to a reduced number of cache misses. Experiments confirm the interest of these shared cache aware isosurface algorithms, the performance gain increasing as the shared cache size to core number ratio decreases.
Case Study of Multithreaded In-core Isosurface Extraction Algorithms
(The Eurographics Association, 2004) Zhang, Huijuan; Newman, Timothy S.; Zhang, Xiang; Dirk Bartz and Bruno Raffin and Han-Wei Shen
A comparative, empirical study of the computational performance of multithreading strategies for Marching Cubes isosurface extraction is presented. Several representative data-centric strategies are considered. Focus is on in-core computation that can be performed on desktop (single- or dual-CPU) computers. The study's empirical results are analyzed on the metrics of initialization overhead, individual surface extraction time, and total run time. In addition, an analysis of cache behavior and memory storage requirements is presented.
The Challenges of Commodity-Based Visualization Clusters
(The Eurographics Association, 2006) Klosowski, J. T.; Alan Heirich and Bruno Raffin and Luis Paulo dos Santos
The performance of commodity computer components continues to increase dramatically. Processors, internal I/O buses, graphics cards, and network adapters have all exhibited significant improvements without significant increases in cost. Due to the increase in the price/performance ratio of computers utilizing such components, clusters of commodity machines have become commonplace in today s computing world and are steadily displacing specialized, high-end, shared-memory machines for many graphics and visualization workloads. Acceptance, and more importantly utilization, of commodity clusters has been hampered, however, due to the significant challenges introduced when switching from a shared-memory architecture to a distributed memory one. Such challenges range from having to redesign applications for distributed computing to gathering pixels from multiple sources and finally synchronizing multiple video outputs when driving large displays. In addition to these impediments for the application developer, there are also many mundane problems which arise when working with clusters, including their installation and general system administration. This paper details these challenges and the many solutions that have been developed in recent years. As the nature of commodity hardware components suggests, the solutions to these research challenges are largely softwarebased, and include middleware layers for distributing the graphics workload across the cluster as well as for aggregating the final results to display for the user. At the forefront of this discussion will be IBM s Deep View project, whose goal has been the design and implementation of a scalable, affordable, high-performance visualization system for parallel rendering. In the past six years, Deep View has undergone numerous redesigns to make it as efficient as possible. We highlight the issues involved in this process, up to and including the current incarnation of Deep View, as well as what s on the horizon for cluster-based rendering.

Browse

Browsing EGPGV: Eurographics Workshop on Parallel Graphics and Visualization by Title

Results Per Page

Sort Options