EGPGV14: Eurographics Symposium on Parallel Graphics and Visualization

Permanent URI for this collection

https://diglib.eg.org/handle/10.2312/7757

Browse

Now showing 1 - 10 of 10

Auto-Tuning Complex Array Layouts for GPUs
(The Eurographics Association, 2014) Weber, Nicolas; Goesele, Michael; Margarita Amor and Markus Hadwiger
The continuing evolution of Graphics Processing Units (GPU) has shown rapid performance increases over the years. But with each new hardware generation, the constraints for programming them efficiently have changed. Programs have to be tuned towards one specific hardware to unleash the full potential. This is time consuming and costly as vendors tend to release a new generation every 18 months. It is therefore important to auto-tune GPU code to achieve GPU-specific improvements. Using either static or empirical profiling to adjust parameters or to change the kernel implementation. We introduce a new approach to automatically improve memory access on GPUs. Our system generates an application specific library which abstracts the memory access for complex arrays on the host and GPU side. This allows to optimize the code by exchanging the memory layout without recompiling the application, as all necessary layouts are pre-compiled into the library. Our implementation is able to speedup real-world applications up to an order of magnitude and even outperforms hand-tuned implementations.
Clustered Pre-convolved Radiance Caching
(The Eurographics Association, 2014) Rehfeld, Hauke; Zirr, Tobias; Dachsbacher, Carsten; Margarita Amor and Markus Hadwiger
We present a scalable method for rendering indirect illumination in diffuse and glossy scenes. Our method builds on pre-convolved radiance caching (RC), which enables reusing the incident radiance computed at a surface point for its neighborhood. Our contributions include efficient and robust generation of these RCs based on a pre-filtered voxel representation that stores scene-geometry and surface illumination. In addition, we describe a distribution strategy that places the RCs according to screen-space clusters to ensure all pixels have valid radiance data when evaluating indirect illumination. The results demonstrate the scalability of our method and analyze the relation between render quality, surface glossiness and computation time, which depends on the number of caches and their resolution.
Collaborative High-fidelity Rendering over Peer-to-peer Networks
(The Eurographics Association, 2014) Bugeja, Keith; Debattista, Kurt; Spina, Sandro; Chalmers, Alan; Margarita Amor and Markus Hadwiger
Due to the computational expense of high-fidelity graphics, parallel and distributed systems have frequently been employed to achieve faster rendering times. The form of distributed computing used, with a few exceptions such as the use of GRID computing, is limited to dedicated clusters available to medium to large organisations. Recently, a number of applications have made use of shared resources in order to alleviate costs of computation. Peer-to-peer computing has arisen as one of the major models for off-loading costs from a centralised computational entity to benefit a number of peers participating in a common activity. This work introduces a peer-to-peer collaborative environment for improving rendering performance for a number of peers where the program state, that is the result of some computation among the participants, is shared. A peer that computes part of this state shares it with the others via a propagation mechanism based on epidemiology. In order to demonstrate this approach, the traditional Irradiance Cache algorithm is extended to account for sharing over a network within the presented collaborative framework introduced. Results, which show an overall speedup with little overheads, are presented for scenes in which a number of peers navigate shared virtual environments.
Finely-Threaded History-Based Topology Computation
(The Eurographics Association, 2014) Miller, Robert; Moreland, Kenneth; Ma, Kwan-Liu; Margarita Amor and Markus Hadwiger
Graphics and visualization pipelines often make use of highly parallelized algorithms which transform an input mesh into an output mesh. One example is Marching Cubes, which transforms a voxel grid into a triangle mesh approximation of an isosurface. These techniques often discard the topological connectivity of the output mesh, and instead produce a 'soup' of disconnected geometric elements. Calculations that require local neighborhood, such as surface curvature, cannot be performed on such outputs without first reconstructing its topology. We present a novel method for reconstructing topological information across several kinds of mesh transformations, which we demonstrate with GPU and OpenMP implementations. Our approach makes use of input topological elements for efficient location of coincident elements in the output. We provide performance data for the technique for isosurface generation, tetrahedralization, subdivision, and dual mesh generation, and demonstrate its use in visualization pipelines containing further computations of local curvature and mesh coarsening.
Freeprocessing: Transparent in situ Visualization via Data Interception
(The Eurographics Association, 2014) Fogal, Thomas; Proch, Fabian; Schiewe, Alexander; Hasemann, Olaf; Kempf, Andreas; Krüger, Jens; Margarita Amor and Markus Hadwiger
In situ visualization has become a popular method for avoiding the slowest component of many visualization pipelines: reading data from disk. Most previous in situ work has focused on achieving visualization scalability on par with simulation codes, or on the data movement concerns that become prevalent at extreme scales. In this work, we consider in situ analysis with respect to ease of use and programmability. We describe an abstraction that opens up new applications for in situ visualization, and demonstrate that this abstraction and an expanded set of use cases can be realized without a performance cost.
Parallel Methodologies for a Micropolygon Renderer
(The Eurographics Association, 2014) Bolstad, Mark A.; Margarita Amor and Markus Hadwiger
This paper compares the performance of three different methodologies for a multi-threaded micropolygon-based renderer. We extend the REYES [AG99] algorithm for multi-threaded rendering, which we call CASCADE. CASCADE processes one bucket per thread, forwarding primitives and micropolygons to other buckets/threads through split and dice operations. ROUND_ROBIN runs N single-threaded versions of CASCADE and a compositor, where primitives are distributed to each thread in a semi-random manner. NO_FORWARD executes split and dice operations, but a primitive that spans multiple buckets is processed independently by different threads and the primitives generated through split and dice operations that project outside the current bucket are discarded. In addition, bucket scheduling is used in this case to ensure that no thread is starved for work. Extensive analysis demonstrates that none of these methodologies are clearly superior to the others under all combinations of primitive size, count, transparency, and parallelism, so, a hybrid algorithm is proposed whose performance characteristics make it the best choice under all but the most pathological cases
Parallel Progressive Mesh Editing
(The Eurographics Association, 2014) Derzapf, Evgenij; Grund, Nico; Guthe, Michael; Margarita Amor and Markus Hadwiger
Highly detailed models are commonly used in computer games and other interactive rendering applications. Intuitive editing methods are thus also required in addition to rendering algorithms. Progressive meshes are often employed to improve the rendering performance by reducing the number of rasterized triangles. The classical work flow is to generate a model and then use simplification algorithms to construct the progressive mesh. Thus the whole simplification has to be performed again after editing the model. This does not only require additional processing time but also hinders animations of progressive meshes. Based on this observation we propose a real-time parallel multi resolution modeling algorithm for progressive meshes. It can be used for real-time editing and animation of complex progressive meshes. Due to the progressive representation we can intuitively modify the overall shape or small scale details. To quickly generate a progressive mesh from a complex triangle model we also propose a massively parallel simplification algorithm that generates all required data structures within a few seconds.
Performance Modeling of vl3 Volume Rendering on GPU-Based Clusters
(The Eurographics Association, 2014) Rizzi, Silvio; Hereld, Mark; Insley, Joseph; Papka, Michael E.; Uram, Thomas; Vishwanath, Venkatram; Margarita Amor and Markus Hadwiger
This paper presents an analytical model for parallel volume rendering of large datasets using GPU-based clusters. The model is focused on the parallel volume rendering and compositing stages and predicts their performance requiring only a few input parameters. We also present vl3, a novel parallel volume rendering framework for visualization of large datasets. Its performance is evaluated on a GPU-based cluster, weak and strong scaling are studied, and model predictions are validated with experimental results on up to 128 GPUs.
Precomputing Sound Scattering for Structured Surfaces
(The Eurographics Association, 2014) Mückl, Gregor; Dachsbacher, Carsten; Margarita Amor and Markus Hadwiger
Room acoustic simulations commonly use simple models for sound scattering on surfaces in the scene. However, the continuing increase of available parallel computing power makes it possible to apply more sophisticated models. We present a method to precompute the distribution of the reflected sound off a structured surface described by a height map and normal map using the Kirchhoff approximation. Our precomputation and interpolation scheme, based on representing the reflected pressure with von-Mises-Fisher functions, is able to retain many directional and spectral features of the reflected pressure while keeping the computational and storage requirements low. We discuss our model and demonstrate applications of our precomputed functions in acoustic ray tracing and a novel interactive method suitable for applications such as architectural walk-throughs and video games.
A Study of Parallel Data Compression Using Proper Orthogonal Decomposition on the K Computer
(The Eurographics Association, 2014) Bi, Chongke; Ono, Kenji; Ma, Kwan-Liu; Wu, Haiyuan; Imamura, Toshiyuki; Margarita Amor and Markus Hadwiger
The growing power of supercomputers continues to improve scientists' ability to model larger, more sophisticated problems in science with higher accuracy. An equally important ability is to make full use of the data output from the simulations to help clarify the modeled phenomena and facilitate the discovery of new phenomena. However, along with the scale of computation, the size of the resulting data has exploded; it becomes infeasible to output most of the data, which defeats the purpose of conducting large-scale simulations. In order to address this issue so that more data may be archived and studied, we have developed a scalable parallel data compression solution to reduce the size of large-scale data with low computational cost and minimal error. We use the proper orthogonal decomposition (POD) method to compress data because this method can effectively extract the main features from the data, and the resulting compressed data can be decompressed in linear time. Our implementation achieves high parallel efficiency with a binary load-distributed approach, which is similar to the binary-swap image composition method. This approach allows us to effectively use all of the processors and to reduce the interprocessor communication cost throughout the parallel compression calculations. The results of tests using the K computer indicate the superior performance of our design and implementation

Browse

Browsing EGPGV14: Eurographics Symposium on Parallel Graphics and Visualization by Title

Results Per Page

Sort Options