EGPGV06: Eurographics Symposium on Parallel Graphics and Visualization
Permanent URI for this collection
Browse
Browsing EGPGV06: Eurographics Symposium on Parallel Graphics and Visualization by Issue Date
Now showing 1 - 20 of 21
Results Per Page
Sort Options
Item Parallelization of Inverse Design of Luminaire Reflectors(The Eurographics Association, 2006) Magallon, J. A.; Patow, G.; Seron, F. J.; Pueyo, X.; Alan Heirich and Bruno Raffin and Luis Paulo dos SantosThis paper presents the parallelization of techniques for the design of reflector shapes from prescribed optical properties (far-field radiance distribution), geometrical constraints and, if available, a user-given initial guess. This is a problem of high importance in the field of Lighting Engineering, more specifically for Luminaire Design. Light propagation inside and outside the optical set must be computed and the resulting radiance distribution compared to the desired one in an iterative process. Constraints on the shape imposed by industry needs must be taken into account, bounding the set of possible shape definitions. A general approach is based on a minimization procedure on the space of possible reflector shapes, starting from a generic or a user-provided shape. This minimization techniques are usually known also as inverse problems, and are very expensive in computational power, requiring a long time to reach a good solution. To reduce this high resource needs we propose a parallel approach, based on SMP and clustering, that can bring the simulation times to a more feasible level.Item Sorted Pipeline Image Composition(The Eurographics Association, 2006) Roth, Marcus; Reiners, Dirk; Alan Heirich and Bruno Raffin and Luis Paulo dos SantosThe core advantage of sort last rendering is the theoretical nearly linear scalability in the number of rendering nodes, which makes it very attractive for very large polygonal and volumetric models. The disadvantage of sort last rendering is that a final image composition step is necessary in which a huge amount of data has to be transferred between the rendering nodes. Even with gigabit or faster networks the image composition introduces an overhead that makes it impractical to use sort last parallel rendering for interactive applications on large clusters. This paper describes the Sorted Pipeline Composition algorithm that reduces the amount of data that needs to be transferred by an order of magnitude and results in a frame rate that is at least twice as high as the widely used binary swap image composition algorithm.Item A Scalable, Hybrid Scheme for Volume Rendering Massive Data Sets(The Eurographics Association, 2006) Childs, Hank; Duchaineau, Mark; Ma, Kwan-Liu; Alan Heirich and Bruno Raffin and Luis Paulo dos SantosWe introduce a parallel, distributed memory algorithm for volume rendering massive data sets. The algorithm's scalability has been demonstrated up to 400 processors, rendering one hundred million unstructured elements in under one second. The heart of the algorithm is a hybrid approach that parallelizes over both the elements of the input data and over the pixels of the output image. At each stage of the algorithm, there are strong limits on how much work each processor performs, ensuring good parallel efficiency. The algorithm is sample-based. We present two techniques for calculating the sample points: a 3D rasterization technique and a kernel-based technique, which trade off between speed and generality. Finally, the algorithm is very flexible. It can be deployed in general purpose visualization tools and can also support diverse mesh types, ranging from structured grids to curvilinear and unstructured meshes to point clouds.Item Optimized Volume Raycasting for Graphics-Hardware-based Cluster Systems(The Eurographics Association, 2006) Müller, C.; Strengert, M.; Ertl, T.; Alan Heirich and Bruno Raffin and Luis Paulo dos SantosIn this paper, we present a sort-last parallel volume rendering system based on single-pass volume raycasting performed in the fragment shader unit. The architecture is aimed for displaying data sets that utilize the total distributed texture memory at interactive framerates. We use uniform texture bricks that are distributed by means of a kd-tree to employ object space partitioning. They are further used for implementing empty-space-skipping and a load balancing mechanism, which also makes use of the kd-tree, to increase the overall performance of the rendering system. Performance numbers are given for a mid-range GPU-cluster system consisting of eight render nodes with an Infiniband interconnection.Item The Challenges of Commodity-Based Visualization Clusters(The Eurographics Association, 2006) Klosowski, J. T.; Alan Heirich and Bruno Raffin and Luis Paulo dos SantosThe performance of commodity computer components continues to increase dramatically. Processors, internal I/O buses, graphics cards, and network adapters have all exhibited significant improvements without significant increases in cost. Due to the increase in the price/performance ratio of computers utilizing such components, clusters of commodity machines have become commonplace in today s computing world and are steadily displacing specialized, high-end, shared-memory machines for many graphics and visualization workloads. Acceptance, and more importantly utilization, of commodity clusters has been hampered, however, due to the significant challenges introduced when switching from a shared-memory architecture to a distributed memory one. Such challenges range from having to redesign applications for distributed computing to gathering pixels from multiple sources and finally synchronizing multiple video outputs when driving large displays. In addition to these impediments for the application developer, there are also many mundane problems which arise when working with clusters, including their installation and general system administration. This paper details these challenges and the many solutions that have been developed in recent years. As the nature of commodity hardware components suggests, the solutions to these research challenges are largely softwarebased, and include middleware layers for distributing the graphics workload across the cluster as well as for aggregating the final results to display for the user. At the forefront of this discussion will be IBM s Deep View project, whose goal has been the design and implementation of a scalable, affordable, high-performance visualization system for parallel rendering. In the past six years, Deep View has undergone numerous redesigns to make it as efficient as possible. We highlight the issues involved in this process, up to and including the current incarnation of Deep View, as well as what s on the horizon for cluster-based rendering.Item Optimized Visualization for Tiled Displays(The Eurographics Association, 2006) Lorenz, Mario; Brunnett, Guido; Alan Heirich and Bruno Raffin and Luis Paulo dos SantosIn this paper we present new functionality we added to the Chromium framework. When driving tiled displays using a sort-first configuration based on the Tilesort stream procession unit (SPU) the performance bottlenecks are the high utilization of the client host caused by the expensive sorting and bucketing of geometry and the high bandwidth consumption caused by a significant amount of redundant unicast transmissions. We addressed these problems with an implementation of a true point-to-multipoint connection type using UDP multicast. Based on this functionality we developed the so called OPT-SPU. This SPU replaces the widely used Tilesort-SPU in typical Sort-First environments. Tile-sorting and state differencing is not necessary because Multicasting allows us to send the geometry to all server nodes at once. Instead of tile-sorting a conventional frustum culling method is used to avoid needless server utilization caused by rendering of geometry outside their viewports. This approach leads to significant lower processor and memory load on the client and a very effective utilization of available network bandwidth. To avoid redundant transmissions of identical command sequences that are generated by the application several times we put a transparent stream cache into the multicast communication channel. In addition, frustum and hardware accelerated occlusion culling methods may be used to eliminate unnecessary transfer of invisible geometry. Finally, a software based method for synchronization of buffer swap operations at all servers was implemented. In a nutshell, for the first time an appropriate combination of our optimizations makes it possible to render large scenes synchronously on an arbitary number of tiles at nearly constant performance.Item Distributed Force-Directed Graph Layout and Visualization(The Eurographics Association, 2006) Mueller, Christopher; Gregor, Douglas; Lumsdaine, Andrew; Alan Heirich and Bruno Raffin and Luis Paulo dos SantosWhile there exist many interactive tools for the visualization of small graphs and networks, these tools do not address the fundamental problems associated with the visualization of large graphs. In particular, larger graphs require much larger display areas (e.g., display walls) to reduce visual clutter, allowing users to determine the structure of large graphs. Moreover, the layout algorithms employed by these graph visualization tools do not scale to larger graphs, thereby forcing users into a batchoriented process of generating layouts offline and later viewing of static graph images. In this paper, we present a parallel graph layout algorithm based on the Fruchterman-Reingold force-directed layout algorithm and demonstrate its implementation in a distributed rendering environment. The algorithm uses available distributed resources for both compute and rendering tasks, animating the graph as the layout evolves. We evaluate the algorithm for scalability and interactivity and discuss variations that minimize communication for specific types of graphs and applications.Item Rendering on Demand(The Eurographics Association, 2006) Chalmers, A.; Debattista, K.; Sundstedt, V.; Longhurst, P.; Gillibrand, R.; Alan Heirich and Bruno Raffin and Luis Paulo dos SantosIn order for computer graphics to accurately represent real world environments, it is essential that physically based illumination models are used. However, typical global illumination solutions may take many seconds, even minutes to render a single frame. This precludes their use in any interactive system. In this paper we present Rendering on Demand, a selective physically-based parallel rendering system which enables high-fidelity virtual computer graphics imagery to be rendered at close to interactive rates. By exploiting knowledge of the human visual system we substantially reduce computation costs by rendering only the areas of perceptual importance in high quality. The rest of the scene is rendered at a significantly lower quality without the viewer being aware of the quality difference. This is validated through psychophysical experimentation.Item WinSGL: Software Genlocking for Cost-Effective Display Synchronization under Microsoft Windows(The Eurographics Association, 2006) Waschbüsch, M.; Cotting, D.; Duller, M.; Gross, M.; Alan Heirich and Bruno Raffin and Luis Paulo dos SantosThis paper presents the first software genlocking approach for unmodified Microsoft Windows systems, requiring no specialized graphics boards but only a low-cost signal generator as additional hardware. Compared to existing solutions for other operating systems, it does not rely on any real-time extensions or kernel modifications. Its novel design can be divided into two parts: First, an external synchronization signal is transmitted over interrupt lines to a dedicated driver. Second, a user-space application performs the synchronization by inserting or removing lines to the invisible part of the image. Robustness to potential frame losses is achieved through continuous consistent timestamping. Tests yield an accuracy of up to ± ½ line deviation from the external signal and a low CPU load of 2% on current PC systems. Our system has been designed to be compatible with off-the-shelf graphics hardware and digital output devices based on LCD or DLP technology. Our solution can be employed to build cost-effective VR installations such as large tiled and spatially immersive displays using commodity PC clusters.Item Dynamic Load Balancing for Parallel Volume Rendering(The Eurographics Association, 2006) Marchesin, Stéphane; Mongenet, Catherine; Dischler, Jean-Michel; Alan Heirich and Bruno Raffin and Luis Paulo dos SantosParallel volume rendering is one of the most efficient techniques to achieve real time visualization of large datasets by distributing the data and the rendering process over a cluster of machines. However, when using level of detail techniques or when zooming on parts of the datasets, load unbalance becomes a challenging issue that has not been widely studied in the context of hardware-based rendering. In this paper, we address this issue and show how to achieve good load balancing for parallel level of detail volume rendering. We do so by dynamically distributing the data among the rendering nodes according to the load of the previous frame. We illustrate the efficiency of our technique on large datasets.Item Parallel Particle Rendering: a Performance Comparison between Chromium and Aura(The Eurographics Association, 2006) Schaaf, Tom van der; Koutek, Michal; Bal, Henri; Alan Heirich and Bruno Raffin and Luis Paulo dos SantosIn the fields of high performance computing and distributed rendering, there is a great need for a flexible and scalable architecture that supports coupling of parallel simulations to commodity visualization clusters. The most popular architecture that allows such flexibility, called Chromium, is a parallel implementation of OpenGL. It has sufficient performance on applications with static scenes, but in case of more dynamic content this approach often fails. We have developed Aura, a distributed scene graph library, which allows optimized performance for both static and more dynamic scenes. In this paper we compare the performance of Chromium and Aura. For our performance tests, we have selected a dynamic particle system application, which reveals several issues with the Chromium approach of implementing the OpenGL API. Because our distributed scene graph architecture was designed with a different approach, the test results will show that it performs better on this application.Item An Application of Scalable Massive Model Interaction using Shared-Memory Systems(The Eurographics Association, 2006) Stephens, Abe; Boulos, Solomon; Bigler, James; Wald, Ingo; Parker, Steven; Alan Heirich and Bruno Raffin and Luis Paulo dos SantosDuring the end-to-end digital design of a commerical airliner, a massive amount of geometric data is produced. This data can be used for inspection or maintenance throughout the life of the aircraft. Massive model interactive ray tracing can provide maintenance personnel with the capability to easily visualize the entire aircraft at once. This paper describes the design of the renderer used to demonstrate the feasibility of integrating interactive ray tracing in a commerical aircraft inspection and maintenance scenario. We describe the feasibility demonstration involving actual personnel performing real-world tasks and the scalable architecture of the parallel shared memory renderer.Item Accelerating the Irradiance Cache through Parallel Component-Based Rendering(The Eurographics Association, 2006) Debattista, Kurt; Santos, Luís Paulo; Chalmers, Alan; Alan Heirich and Bruno Raffin and Luis Paulo dos SantosThe irradiance cache is an acceleration data structure which caches indirect diffuse samples within the framework of a distributed ray-tracing algorithm. Previously calculated values can be stored and reused in future calculations, resulting in an order of magnitude improvement in computational performance. However, the irradiance cache is a shared data structure and so it is notoriously difficult to parallelise over a distributed parallel system. The hurdle to overcome is when and how to share cached samples. This sharing incurs communication overheads and yet must happen frequently to minimise cache misses and thus maximise the performance of the cache. We present a novel component-based parallel algorithm implemented on a cluster of computers, whereby the indirect diffuse calculations are calculated on a subset of nodes in the cluster. This method exploits the inherent spatial coherent nature of the irradiance cache; by reducing the set of nodes amongst which cached values must be shared, the sharing frequency can be kept high, thus decreasing both communication overheads and cache misses. We demonstrate how our new parallel rendering algorithm significantly outperforms traditional methods of distributing the irradiance cache.Item Remote Large Data Visualization in the ParaView Framework(The Eurographics Association, 2006) Cedilnik, Andy; Geveci, Berk; Moreland, Kenneth; Ahrens, James; Favre, Jean; Alan Heirich and Bruno Raffin and Luis Paulo dos SantosScientists are using remote parallel computing resources to run scientific simulations to model a range of scientific problems. Visualization tools are used to understand the massive datasets that result from these simulations. A number of problems need to be overcome in order to create a visualization tool that effectively visualizes these datasets under this scenario. Problems include how to effectively process and display massive datasets and how to effectively communicate data and control information between the geographically distributed computing and visualization resources. We believe a solution that incorporates a data parallel data server, a data parallel rendering server and client controller is key. Using this data server, render server, client model as a basis, this paper describes in detail a set of integrated solutions to remote/distributed visualization problems including presenting an efficient M to N parallel algorithm for transferring geometry data, an effective server interface abstraction and parallel rendering techniques for a range of rendering modalities including tiled display walls and CAVEs.Item Parallel Texture-Based Vector Field Visualization on Curved Surfaces Using GPU Cluster Computers(The Eurographics Association, 2006) Bachthaler, S.; Strengert, M.; Weiskopf, D.; Ertl, T.; Alan Heirich and Bruno Raffin and Luis Paulo dos SantosWe adopt a technique for texture-based visualization of flow fields on curved surfaces for parallel computation on a GPU cluster. The underlying LIC method relies on image-space calculations and allows the user to visualize a full 3D vector field on arbitrary and changing hypersurfaces. By using parallelization, both the visualization speed and the maximum data set size are scaled with the number of cluster nodes. A sort-first strategy with image-space decomposition is employed to distribute the workload for the LIC computation, while a sort-last approach with an object-space partitioning of the vector field is used to increase the total amount of available GPU memory. We specifically address issues for parallel GPU-based vector field visualization, such as reduced locality of memory accesses caused by particle tracing, dynamic load balancing for changing camera parameters, and the combination of image-space and object-space decomposition in a hybrid approach. Performance measurements document the behavior of our implementation on a GPU cluster with AMD Opteron CPUs, NVIDIA GeForce 6800 Ultra GPUs, and Infiniband network connection.Item Accelerated Volume Rendering with Homogeneous Region Encoding using Extended Anisotropic Chessboard Distance on GPU(The Eurographics Association, 2006) Es, A.; Keles, H. Y.; Isler, V.; Alan Heirich and Bruno Raffin and Luis Paulo dos SantosRay traversal is the most time consuming part in volume ray casting. In this paper, an acceleration technique for direct volume rendering is introduced, which uses a GPU friendly data structure to reduce traversal time. Empty regions and homogeneous regions in the volume is encoded using extended anisotropic chessboard distance (EACD) transformation. By means of EACD encoding, both the empty spaces and samples belonging to the homogeneous regions are processed efficiently on GPU with minimum branching. In addition to skipping empty spaces, this method reduces the sampling operation inside a homegeneous region using ray integral factorization. The proposed algorithm integrates the optical properties in the homogeneous regions in one step and leaps directly to the next region. We show that our method can work more than 6 times faster than primitive ray caster without any visible loss in image quality.Item Parallel Simulation of Cloth on Distributed Memory Architectures(The Eurographics Association, 2006) Thomaszewski, B.; Blochinger, W.; Alan Heirich and Bruno Raffin and Luis Paulo dos SantosThe physically based simulation of clothes in virtual environments is a highly demanding problem. It involves both modeling the internal material properties of the textile and the interaction with the surrounding scene. We present a parallel cloth simulation approach designed for distributed memory parallel architectures, in particular clusters built of commodity components. In this paper, we focus on the parallelization of the collision handling phase. In order to cope with the high irregularity of this problem we employ a task parallel approach with fully dynamic problem decomposition. This leads to a robust algorithm, regardless of the complexity of the scene. We report on initial performance measurements indicating the usefulness of our approach.Item Time Step Prioritising in Parallel Feature Extraction on Unsteady Simulation Data(The Eurographics Association, 2006) Wolter, M.; Hentschel, B.; Schirski, M.; Gerndt, A.; Kuhlen, T.; Alan Heirich and Bruno Raffin and Luis Paulo dos SantosExplorative analysis of unsteady computational fluid dynamics (CFD) simulations requires a fast extraction of flow features. For time-varying data, the extraction algorithm has to be executed for each time step in the period under observation. Even when parallelised on a remote high performance computer, the user s waiting time still exceeds interactivity criteria for large data sets. Moreover, computations are generally performed in a fixed order, not taking into account the importance of partial results for the user s investigation. In this paper we propose a general method to guide parallel feature extraction on unsteady data sets in order to assist the user during the explorative analysis even though interactive response times might not be available. By re-ordering of single time step computations, the order in which features are provided is arranged according to the user s exploration process. We describe three different concepts based on typical user behaviours. Using this approach, parallel extraction of unsteady features is enhanced for arbitrary extraction methods.Item Interactive Volume Rendering of Unstructured Grids with Time-Varying Scalar Fields(The Eurographics Association, 2006) Bernardon, Fábio F.; Callahan, Steven P.; Comba, João L. D.; Silva, Cláudio T.; Alan Heirich and Bruno Raffin and Luis Paulo dos SantosInteractive visualization of time-varying volume data is essential for many scientific simulations. This is a challenging problem since this data is often large, can be organized in different formats (regular or irregular grids), with variable instances of time (from few hundreds to thousands) and variable domain fields. It is common to consider subsets of this problem, such as time-varying scalar fields (TVSFs) on static structured grids, which are suitable for compression using multi-resolution techniques and can be efficiently rendered using texture-mapping hardware. In this work we propose a rendering system that considers unstructured grids, which do not have the same regular properties crucial to compression and rendering. Our solution uses an encoding mechanism that is tightly coupled with our rendering system. Decompression is performed on the CPU while rendering for the next frame is processed. The rendering system runs entirely on the GPU, with an adaptive time-varying visualization that has a built-in level-of-detail that chooses the most significant aspects of the data.Item Multi-layered Image Caching for Distributed Rendering of Large Multiresolution Datasets(The Eurographics Association, 2006) Strasser, Jonathan; Pascucci, Valerio; Ma, Kwan-Lui; Alan Heirich and Bruno Raffin and Luis Paulo dos SantosThe capability to visualize large volume datasets has applications in a myriad of scientific fields. This paper presents a large data visualization solution in the form of distributed, multiresolution, progressive processing. This solution reduces the problem of rendering a large volume data into many simple and independent problems that can be straightforwardly distributed to multiple computers. By completely decoupling rendering and display with image caching, we are able to maintain a high level of interactivity during exploration of the data, which is key to obtaining insights into the data.