2024

Permanent URI for this collection


Improving the efficiency of point cloud data management

Bormann, Pascal

Visual Insights into Memory Behavior of GPU Ray Tracers

von Buelow, Max

Computational models of visual attention and gaze behavior in virtual reality

Martin, Daniel

Task-Aware 3D Geometric Synthesis

Sellán, Silvia

Photorealistic Simulation and Optimization of Lighting Conditions

Vitsas, Nick

Intrinsic approaches to learning and computing on curved surfaces

Wiersma, Ruben Timotheüs

Massively Parallel Editing and Post-Processing of Unstructured Tetrahedral Meshes for Virtual Prototyping

Ströter, Daniel

Computational Inverse Design of Shape-Morphing Structures

Ren, Yingying

Learning Structured Representations of 3D CAD Models

Fenggen Yu

Towards Computationally Efficient, Photorealistic, and Scalable 3D Generative Modelling

Animesh Karnewar

Accelerating Geometric Queries for Computer Graphics: Algorithms, Techniques, and Applications

Evangelou Iordanis

Perception-Based Techniques to Enhance User Experience in Virtual Reality

Colin Groth

Discrete Laplacians for General Polygonal and Polyhedral Meshes

Astrid Pontzen (née Bunge)

Learning Digital Humans from Vision and Language

Yao Feng


Browse

Recent Submissions

Now showing 1 - 14 of 14
  • Item
    Improving the efficiency of point cloud data management
    (TUprints, 2024-07) Bormann, Pascal
    The collection of point cloud data has increased drastically in recent years, which poses challenges for the data management layer. Multi-billion point datasets are commonplace and users are getting accustomed to real-time data exploration in the Web. To make this possible, existing point cloud data management approaches rely on optimized data formats which are time- and resource-intensive to generate. This introduces long wait times before data can be used and frequent data duplication, since these optimized formats are often domain- or application-specific. As a result, data management is a challenging and expensive aspect when developing applications that use point cloud data. We observe that the interaction between applications and the point cloud data management layer can be modeled as a series of queries similar to those found in traditional databases. Based on this observation, we evaluate current point cloud data management using three query metrics: Responsiveness, throughput, and expressiveness. We contribute to the current state of the art by improving these metrics for both the handling of raw files without preprocessing, as well as indexed point clouds. In the domain of unindexed point cloud data, we introduce the concept of ad-hoc queries, which are queries executed ad-hoc on raw point cloud files. We demonstrate that ad-hoc queries can improve query responsiveness significantly as they do not require long wait times for indexing or database imports. Using columnar memory layouts, queries on datasets of up to a billion points can be answered in interactive or near-interactive time, with throughputs of more than one hundred million points per second on unindexed data. A demonstration of an adaptive indexing method shows that spending a few seconds per query on index creation can improve responsiveness by up to an order of magnitude. Our experiments also confirm the importance of high-throughput systems when querying point cloud data, as the overhead of data transmission has a significant effect on the overall query performance. For situations where indexing is mandatory, we demonstrate improvements to the runtime performance of existing point cloud indexing tools. We developed a fast indexer based on task-parallel programming, using Morton indices to efficiently sort and distribute point batches onto worker threads. This system, called Schwarzwald, outperformed existing indexers by up to a factor 9 when it was first published, and still has competitive performance to current out-of-core capable indexers. Additionally we adapted our indexing algorithm for distributed processing in a Cloud-environment and demonstrate that its horizontal scalability allows it to outperform all existing indexers by up to a factor of 3. Lastly we demonstrated point cloud indexing in real-time during Light Detection And Ranging (LiDAR) capturing, based on a similar task-based algorithm but optimized for progressive indexing. Our real-time indexer is able to keep up with current LiDAR sensors in a real-world test, with end-to-end latencies as low as 0.1 seconds. Together, our improvements significantly reduce wait times for working with point cloud data and increase the overall efficiency of the data access layer.
  • Item
    Visual Insights into Memory Behavior of GPU Ray Tracers
    (TUprints, 2024-07) von Buelow, Max
    Ray tracing is a fundamental rendering technique that typically projects three-dimensional representations of a scene onto a two-dimensional display. This is achieved by perspectively sampling a set of rays into the scene and computing intersections against the relevant geometry. Secondary rays may be sent out from these intersection points, allowing for physically correct global illumination on the reverse photon direction. Real-time rendering has historically used classical rasterization pipelines, which are straightforward to implement on hardware as they form a data-parallel problem projecting the whole scene into the coordinate system of the image. In contrast, task-parallel ray tracing suffers from incoherency between rays. However, recent advances in ray tracing have led to more efficient approaches, resulting in even more efficient embedded hardware implementations. While these approaches are already capable of rendering realistic images, further improvements in run-time performance can compensate for computational time to achieve higher framerates, display resolutions, ray-tracing recursion depths, or reducing the energy footprint of ray-tracing data centers. A fundamental technique for improving ray-tracing performance is the use of bounding-volume hierarchies (BVH), which prevent rays from intersecting the entire scene, especially in occluded or distant regions. In addition to the structural efficiency of a BVH, the primary bottlenecks of GPU ray tracing are memory latency and work distribution. These factors mainly result in more coherent memory accesses, making caching more efficient. Creating programs with the goal of achieving higher caching rates typically requires increased programming efforts and a deep understanding of the hardware, as an additional abstraction layer is introduced, making the memory pipeline less transparent. General-purpose profilers aim to support the implementation process. However, they typically display caching rates based on kernel calls. This is because these values are measured using basic hardware counters that do not distinguish between the context of a memory access. In many cases, it would be useful to have a more detailed representation of memory-related profiling metrics, such as the number of recordings per memory allocation or projections into other domains, such as the framebuffer or the scene geometry. This thesis presents a new method for simulating the GPU memory pipeline accurately. The method uses memory traces exported by dynamic binary instrumentation, which can be applied to any compiled GPU binaries, similar to standard profilers. The exported memory profiles can be used for performance visualization purposes in individual domains, as well as traditional memory profiling metrics that can be displayed in finer granularity than usual. A method for mapping memory metrics onto the original scene is included, allowing users to explore profiling results within the scene domain, making the profiling process more intuitive. In addition, this thesis presents a novel compressed ray-tracing implementation that optimizes its memory footprint by making assumptions about the topological properties of the scene to be rendered. The findings can be used to evaluate and optimize a wide range of ray tracing and ray marching applications in a user-friendly manner.
  • Item
    Computational models of visual attention and gaze behavior in virtual reality
    (2024-03-08) Martin, Daniel
    Virtual reality (VR) is an emerging medium that has the potential to unlock unprecedented experiences. Since the late 1960s, this technology has advanced steadily, and can nowadays be a gateway to a completely different world. VR offers a degree of realism, immersion, and engagement never seen before, and lately we have witnessed how newer virtual content is being continuously created. However, to get the most out of this promising medium, there is still much to learn about people’s visual attention and gaze behavior in the virtual universe. Questions like “What attracts users’ attention?” or “How malleable is the human brain when in a virtual experience?” have no definite answer yet. We argue that it is important to build a principled understanding of viewing and attentional behavior in VR. This thesis presents contributions in two key aspects: Understanding and modeling users’ gaze behavior, and leveraging imperceptible manipulations to improve the virtual experience. In the first part of this thesis we have focused on developing computational models of gaze behavior in virtual environments. First, and resorting to the well-known concept of saliency, we have devised models of user attention in 360o images and 360o videos that are able to predict which parts of a virtual scene are more likely to draw viewers’ attention. Then, we have designed another two computational models for spatio-temporal attention prediction, one of them able to simulate thousands of virtual observers per second by generating realistic sequences of gaze points in 360o images, and the other one predicting different, yet plausible sequences of fixations on traditional images. Additionally, we have explored how attention works in 3D meshes. All such models have allowed us to delve into the particularities of human gaze behavior under different environments. Besides that, we have aimed at achieving a deeper understanding on visual attention in multimodal environments. First, we have exhaustively reviewed a vast literature on the use of additional sensory modalities, like audio, haptics, or proprioception, in virtual reality - also known as multimodality -, and its role and benefits in several disciplines. Then, we have gathered and analyzed the largest dataset of viewing behavior in ambisonic 360o videos to date, finding effects on different factors like type of content, or gender, among others. We have finally analyzed how viewing behavior varies depending on the performed tasks: We have delved into attention in the very specific case of driving scenarios, and we have also studied how significant effects in gaze behavior can be found when performing different tasks in immersive environments. The second part of this thesis attempts to improve virtual experiences by means of imperceptible manipulations. We have firstly focused on lateral movement in VR, and have devised thresholds for the detection of such manipulations, which we then applied in three key problems in VR that have no definite solution yet, namely 6-DoF viewing of 3-DoF content, overcoming physical space constraints, and reducing motion sickness. On the other hand, we have explored the manipulation of the virtual scene, resorting to the phenomenon of change blindness, and have derived insights and guidelines on how to elicit or avoid such an effect, and how human brains’ limitations affect it.
  • Item
    Task-Aware 3D Geometric Synthesis
    (University of Toronto, 2024) Sellán, Silvia
    This thesis is about the different ways in which three-dimensional shapes come into digital existence inside a computer. Specifically, it argues that this geometric synthesis process should be tuned to the specific end for which an object is modeled or captured, and proposes building algorithms specific to said end. The majority of this thesis is dedicated to how 3D shapes are designed, and introduces changes to this modeling process to incorporate manufacturing constraints (e.g., that an object can physically be built out of a specific material or with a specific machine), precomputed simulation data (e.g., an object’s response to an impact) or specific user inputs (e.g., 3D drawing in Virtual or Augmented Reality). Importantly, these changes include rethinking the ways in which geometry is commonly represented, instead introducing formats that benefit specific applications, as well as efficient algorithms for converting between them. By contrast, the latter part of this thesis concerns itself with the task of capturing real-world 3D surfaces, a process that necessarily involves reconstructing continuous mathematical objects from imperfect, noisy and occluded discrete information. This thesis introduces a novel, stochastic lens from which to study this fundamentally underdetermined process, allowing for the introduction of task-specific priors as well as quantifying the uncertainty of common algorithmic predictions. This perspective is shown to provide critical insights in common 3D scanning paradigms. While geometric capture is the natural first step in which to introduce this statistical perspective, the thesis ends by enumerating other tasks further along the geometric processing pipeline that could benefit from it.
  • Item
    Photorealistic Simulation and Optimization of Lighting Conditions
    (2024-05) Vitsas, Nick
    Lighting plays a very important role in our everyday life, affecting our safety, comfort, well-being and performance. Today, computational methods and tools can be applied to provide recommendations for improving light conditions and finding energy-efficient ways to exploit natural lighting. This thesis addresses the problem of computational optimization of light transport to improve lighting effectiveness, by improving on various aspects of the process, such as goal-driven parametric geometry configuration for building openings and interior design, efficient natural lighting sampling and interactive photorealistic simulation of light transport. Physically-based light transport is at the core of each task and we show how lighting evaluation has a broader application scope than image synthesis. In the domain of light-driven geometry optimization, the thesis makes two contributions, one concerning the opening design problem and one regarding the optimal arrangement of movable objects for interior design. Opening design comes at the early stages of architectural design. and concerns decisions about the geometric characteristics of windows, skylights, hatches, etc. It greatly impacts the overall energy efficiency, thermal profile, air flow and appearance of a building, both internally and externally. It also directly controls daylighting availability, which is very difficult to predict and assess without automatic tools. We developed a computational methodology and a system to automate the process of opening recommendations in a fully interactive virtual environment, fully supporting parametric constraints and illumination intentions. We optimize openings with respect to their shape, position, size and cardinality, based on Bayesian optimization to propose physically correct openings on the geometry of the building. For the light-driven interior design problem, we proposed and evaluated an automatic interior layout process to produce valid object arrangements guided by geometric and illumination constraints, optimizing for glare, correct illuminance levels and lighting uniformity. Geometric and lighting goals are combined into a cost function that allows for a hierarchical, stochastic exploration of the available space of valid configurations. Optimizing for the contribution of natural lighting is an integral part of any outdoor and indoor environment design process. Analytic formulas for clear skies are a computationally and memory efficient method to create physically accurate sky maps of clear sunny days. However, to simulate light transport, sky models must be efficiently sampled. This is typically done via standard importance sampling approaches for image-based lighting, which tend to be slow and wasteful for the predictable nature of the radiance distribution of analytic sky models. We propose and evaluate a method for fitting a truncated Gaussian mixture model on the radiance distribution of the sky map that is both compact and fast to evaluate. Light-driven geometry optimization requires both accurate and fast light transport evaluation, since a very large number of light-carrying paths needs to be evaluated at each new proposal state. Advances in graphics hardware have enabled interactive ray tracing, which relies on highly optimized data structures for the acceleration of ray queries. Bounding volume hierarchies based on axis-aligned bounding boxes have been the go-to data structure for fast ray-primitive intersections. Similar hierarchies of oriented bounding boxes (OBBs) provide much higher early hierarchy traversal termination rates, however their construction requires complex algorithms for the extraction of tight-fitting OBBs. To further accelerate ray tracing for our tasks, we properly adapt a high quality OBB extraction algorithm from unordered point sets to operate directly on existing hierarchies, to effectively construct an OBB tree on the GPU. By combining our method with existing fast algorithms from the literature that construct hierarchies in real-time, we are able to produce OBB trees that are extremely fast to build and traverse on the GPU. Furthermore, to allow for accurate light transport evaluators accessible as industry-grade tools, we developed and presented WebRays, the first generic ray intersection framework for the Web that offers a programming interface similar to modern ray tracing pipelines for desktop platforms and allows the implementation of light-driven design tools accessible from any platform
  • Item
    Intrinsic approaches to learning and computing on curved surfaces
    (2024-10-15) Wiersma, Ruben Timotheüs
    This dissertation develops intrinsic approaches to learning and computing on curved surfaces. Specifically, we work on three tasks: analyzing 3D shapes using convolutional neural networks (CNNs), solving linear systems on curved surfaces, and recovering appearance properties from curved surfaces using multi-view capture. We argue that we can find more efficient and better performing algorithms for these tasks by using intrinsic geometry. Chapter two and three consider CNNs on curved surfaces. We would like to find patterns with meaningful directional information, such as edges or corners. On images, it is straightforward to define a convolution operator that encodes directional information, as the pixel grid provides a global reference for directions. Such a global coordinate system is not available for curved surfaces. Chapter two presents Harmonic Surface Networks. We apply a 2D kernel to the surface by using local coordinate systems. These local coordinate systems could be rotated in any direction around the normal, which is a problem for consistent pattern recognition. We overcome this ambiguity by computing complex-valued, rotation-equivariant features and transporting these features between coordinate systems with parallel transport along shortest geodesics. Chapter three presents DeltaConv. DeltaConv is a convolution operator based on geometric operators from vector calculus, such as the Laplacian. A benefit of the Laplacian is that it is invariant to local coordinate systems. This solves the problem of a missing global coordinate system. However, the Laplacian operator is also isotropic. That means it cannot pick up on directional information. DeltaConv constructs anisotropic operators by splitting the Laplacian into gradient and divergence and applying a non-linearity in between. The resulting convolution operators are demonstrated on learning tasks for point clouds and achieve state-of-the-art results with a relatively simple architecture. Chapter four considers solving linear systems on curved surfaces. This is relevant for many applications in geometry processing: smoothing data, simulating or animating 3D shapes, or machine learning on surfaces. A common way to solve large systems on grid-based data is a multigrid method. Multigrid methods require a hierarchy of grids and the operators that map between the levels in the hierarchy. We show that these components can be defined for curved surfaces with irregularly spaced samples using a hierarchy of graph Voronoi diagrams. The resulting approach, Gravo Multigrid, achieves solving times comparable to the state-of-the-art, while taking an order of magnitude less time for pre-processing: from minutes to seconds for meshes with over a million vertices. Chapter five demonstrates the use of intrinsic geometry in the setting of appearance modeling, specifically capturing spatially-varying bidirectional reflectance distribution functions (SVBRDF). A low-cost setup to recover SVBRDFs is to capture photographs from multiple viewpoints. A challenge here, is that some reflectance behavior only shows up under certain viewing positions and lighting conditions, which means that we might not be able to tell one material type from another. We frame this as a question of (un)certainty: how certain are we, based on the input data? We build on previous work that shows that the reflection function can be modeled as a convolution of the BRDF with the incoming light. We propose improvements to the convolution model and develop algorithms for uncertainty analysis fully contained in the frequency domain. The result is a fast and uncertainty-aware SVBRDF recovery on curved surfaces.
  • Item
    Massively Parallel Editing and Post-Processing of Unstructured Tetrahedral Meshes for Virtual Prototyping
    (2024-09-10) Ströter, Daniel
    Today, many tasks in industrial product development rely on virtual prototyping to reduce development time and resource costs. Although virtual prototyping provides significant simplification of product development through the use of computer-aided design and computer-aided engineering, it remains a laborious and time consuming process that involves a number of complex steps. Typically, product development teams optimize their prototypes for many design goals, e.g., economical use of material and stability under forces, which demands many iterations of virtual prototyping. Therefore, methods for the acceleration and shortening of virtual prototyping processes are important technological advances. This thesis presents massively parallel algorithms that exploit the impressive aggregated processing power of present-day general purpose graphics processing units to accelerate and shorten virtual prototyping. As virtual prototyping oftentimes involves the generation, optimization and adaptation of high-resolution volumetric meshes for numerical simulation, this thesis focuses on efficient processing of volumetric meshes. Unstructured tetrahedral meshes are a commonly used type of volumetric meshes, because they provide robust meshing and tetrahedra allow for good discretized approximation of surface features. Therefore, this thesis narrows its scope to unstructured tetrahedral meshes. In virtual prototyping, a number of properties of the tetrahedral mesh concerns the success of a numerical simulation. Important properties are the resolution of the mesh and the shape quality of the tetrahedral elements. Consequently, the optimization and re-meshing of tetrahedral meshes are common tasks in virtual prototyping. This thesis investigates parallelization strategies for tetrahedral mesh editing operations that are fundamental for mesh optimization and re-meshing. In addition, the robustness of the presented methods is a research objective, because successful acceleration of virtual prototyping is only achieved, if the presented methods function properly and produce meshes that are suitable for downstream numerical simulation. One of the primary overheads in virtual prototyping is that new prototype designs demand new discretization of boundary representations to a volumetric mesh. For this reason, virtual prototyping processes can be significantly shortened by methods for avoiding the repeated modeling of the prototype's boundary representations and subsequent mesh generation. In order to extend the facilities of shorter virtual prototyping iterations, this thesis explores user-interactive methods for directly editing the tetrahedral mesh without adjusting the boundary representations in a computer-aided design environment. The fast run time performance of massively parallel processing provides promising potential to achieve editing of high-resolution meshes at interactive rates. Every virtual prototyping process requires a method that allows the development team the visual analysis of the simulation results. In the visual analysis step, the development team typically applies post-processing to the mesh and its annotated simulation results. Since accurate numerical simulations might require high-resolution meshes, the use of graphics processing units is common for post-processing. For post-processing volumetric meshes, it is important to visualize the inner structures of the mesh to enable a complete analysis of the prototype. A common method for post-processing volumetric meshes is direct volume rendering. The direct volume rendering of high-resolution meshes requires comprehensive acceleration data structures for fast spatial search of mesh elements, which can lead to large memory consumption. Therefore, this thesis investigates memory-efficient post-processing of unstructured tetrahedral meshes for better management of the available memory capacity. This thesis presents a multitude of contributions for faster virtual prototyping. It presents conflict detection methods to determine dense sub-meshes for massively parallel edge/face flips and re-meshing. In addition, this thesis contributes a robust massively parallel method to relocate mesh vertices for first-order optimization methods. With the use of the presented methods, optimization and re-meshing of unstructured tetrahedral meshes can be accelerated by one or two orders of magnitude. For shortening virtual prototyping, this thesis presents user-interactive editing by user-selected face groups as well as deformation control to edit unstructured tetrahedral meshes. Due to massively parallel processing, these methods enable interactive mesh editing. The mesh editing includes measures for producing tetrahedral meshes of sufficient quality for downstream numerical simulations. For post-processing of unstructured tetrahedral meshes, this thesis presents a memory-efficient spatial data structure along with a method to coarsen meshes for direct volume rendering. The spatial data structure enables control over memory consumption by a tuning parameter. The coarsening can reduce high-resolution tetrahedral meshes to a quarter of the initial size while well-preserving most visual features.
  • Item
    Computational Inverse Design of Shape-Morphing Structures
    (2024-08-23) Ren, Yingying
    Shape-morphing structures can transform between multiple geometric configurations, enabling a wide range of applications in architecture, robotics, personalizable medical devices, emergency shelters, and space technology. This thesis presents computational inverse design frameworks that incorporate geometric insights and physics-based simulation for three novel shape-morphing structures: 3D weaving with curved ribbons, umbrella meshes, and surface-based inflatables. Our method leverages the potential of digital fabrication technology and optimizes the geometry of the fabrication states to encode the 3D shape, motion, and functionality of these structures. In 3D weaving, we construct smooth free-form surface structures using optimized curved ribbons. By optimizing the geometry of planar ribbons, we obtain assemblies of interwoven ribbons that closely approximate a large variety of target surfaces and that settle reliably back into the target shapes even after external deformation. Umbrella meshes are a new type of volumetric deployable structure that transforms from a compact block into a bending-active 3D surface. We employ insights from conformal geometry to find good initializations for the design parameters. Then we apply numerical optimization to improve the design such that the deployed structures encode both the intrinsic and extrinsic curvature of the target surfaces. Surface-based inflatables are composed of two layers of nearly inextensible sheet material joined together along carefully selected fusing curves. We build a computational framework that employs numerical homogenization to characterize the behavior of parametric families of periodic inflatable patches. We create a database of geometrically diverse fusing patterns and develop a two-scale optimization method to search for fusing curves with good structural properties such that the inflated structures with these fusing curves best approximate input target surfaces. For each shape-morphing structure, we first apply geometric abstractions to explain their unique transformation behavior and gain intuition into efficiently exploring the design space. We then develop robust simulation algorithms to model the complex interaction between the elastic components of these structures. By employing unit-cell-based analysis, we characterize the effective design parameters and create databases of unit cells with mappings from their geometric features to mechanical properties. Our inverse design algorithms integrate these simulation methods and unit-cell analysis to globally optimize the geometry of the fabrication states. In addition to improving each material system and enabling specific applications in architecture and mechanical engineering, these computational approaches also yield fundamental contributions in numerical analysis and optimization algorithms. We validate our approach through a series of physical prototypes and design studies to demonstrate the broad range of new woven geometries, deployable structures, and inflatable structures that are not achievable by existing methods.
  • Item
    Learning Structured Representations of 3D CAD Models
    (Simon Fraser University, 2024-05-21) Fenggen Yu
    Computer-Aided-Design (CAD) models have become widespread in engineering and manufacturing, driving decision-making and product evolution related to 3D models. Understanding the structure of 3D CAD models is crucial for various applications, as it can significantly benefit 3D shape analysis, modeling, and manipulation. With the rapid advancements in AI-powered solutions across all relevant fields, several CAD datasets have emerged to support research in 3D geometric deep learning. However, learning the structure of 3D CAD models presents a challenging task. The primary reason is the significant structure variations among small, intricate parts and the limited availability of labeled datasets to support structure learning of 3D CAD models. This thesis proposes several methods to learn structured representations for 3D CAD models to address these challenges. Firstly, we introduce CAPRI-Net, a self-supervised neural network that learns compact 3D CAD models with adaptive primitive assembly. CAPRI-Net can be trained without ground-truth primitive assembly, and it can reconstruct an input shape by assembling quadric surface primitives via Constructive Solid Geometry (CSG) operations. In our subsequent work, D2CSG, we modify the architecture of CAPRI-Net by assembling the primitives in two dual and complementary network branches, with network weights dropout strategy, to reconstruct 3D CAD models with dedicated details and high genus. Compared to CAPRI-Net, D2CSG is provably general and can produce more compact CSG trees. We further introduce DPA-Net, inspired by the volume rendering algorithm in Neural Radiance Fields (NeRF). DPA-Net uses primitive assembly and differentiable rendering to reconstruct 3D CAD models with textures from sparse views. Finally, we introduce HAL3D, the first active learning tool for fine-grained 3D part labeling. HAL3D can take the output of previous methods as input and assign fine-grained semantic labels to part sets of 3D CAD models along a pre-defined hierarchy tree. We develop two novel features to reduce human efforts: hierarchical and symmetry-aware active labeling. Our human-in-the-loop approach achieves close to error-free fine-grained annotations on any test set with pre-defined hierarchical part labels, with 80% time-saving over manual effort.
  • Item
    Towards Computationally Efficient, Photorealistic, and Scalable 3D Generative Modelling
    (University College London, 2024-08-01) Animesh Karnewar
    GM (Generative Modelling) is a class of self supervised Machine Learning which finds applications in synthetic data generation, semantic representation learning, and various creative and artistic fields. GM (aka. Generative AI) seemingly holds the potential for the next breakthrough in AI; of which, the recent successes in LLMs, text-to-image synthesis and text to-video synthesis serve as formidable testament. The way these generative models have revolutionized the process of 2D content creation, we can expect that 3D generative modelling will also contribute significantly towards simplifying the process of 3D content creation. However, it is non-trivial to extend the 2D generative algorithms to operate on 3D data managing various factors such as the inherent data-sparsity, the growing memory requirements, and the computational complexity. The application of Generative Modelling to 3D data is made even harder due to the pertaining challenges: firstly, finding a large quantity of 3D training data is much more complex than 2D images; and secondly, there is no de-facto representation for 3D assets, where various different representations such as point-clouds, meshes, voxel grids, neural (MLP)s, etc. are used depending on the application. Thus, with the goal of ultimately enabling 3D Generative Models, and considering the aforementioned challenges, I propose this thesis which makes substantial strides “Towards Computationally Efficient, Photorealistic, and Scalable 3D Generative Modelling”.
  • Item
    Accelerating Geometric Queries for Computer Graphics: Algorithms, Techniques, and Applications
    (0000-08-16) Evangelou Iordanis
    In the ever-evolving context of Computer Graphics, the demand for realistic and real-time virtual environments and interaction with digitised or born-digital content has exponentially grown. Whether in gaming, production rendering, computer-aided design, reverse engineering, geometry processing, and understanding or simulation tasks, the ability to rapidly and accurately perform geometric queries of any type is crucial. The actual form of a geometric query varies depending on the task at hand, application domain, input representation, and adopted methodology. These queries may involve intersection tests as in the case of ray tracing, spatial queries, such as needed for recovering nearest sample neighbours, geometry registration in order to classify polygonal primitive inputs, or even virtual scene understanding in order to suggest and embed configurations as in the case of light optimisation and placement. As the applications of these algorithms and, consequently, their complexity continuously grow, traditional geometric queries fall short, when naïvely adopted and integrated in practical scenarios. Therefore, these methods face limitations in terms of computational efficiency and query bandwidth. This is particularly pronounced in scenarios, where vast amounts of geometric data must be processed in interactive or even real-time rates. More often than not, one has to inspect and understand the internal mechanics and theory of the algorithms invoking these geometric queries. This is particularly useful in order to devise appropriately tailored procedures to the underline task, hence maximise their efficiency, both in terms of performance and output quality. As a result, there is an enormous area of research that explores innovative approaches to geometric query acceleration, addressing the challenges posed. The primary focus of this research was to develop innovative methods for accelerating geometric queries within the domain of Computer Graphics. This entails a comprehensive exploration of algorithmic optimisations that include the development of advanced data structures and neural network architectures, tailored to efficiently handle geometric collections. This research addressed not only the computational complexity of individual queries, but also the adaptability of the proposed solutions to diverse applications and scenarios primary within the realm of Computer Graphics but also intersecting domains. The outcome of this research holds the potential to influence the fields that adopt these geometric query methodologies by addressing the associated computational challenges and unlocking novel directions for real-time rendering, interactive simulation, and immersive virtual experiences. More specifically, the contributions of this thesis are divided into two broad directions for accelerating geometric queries: a) global illumination-related, hardware-accelerated nearestneighbour queries and b) application of deep learning to the definition of novel data structures and geometric query methods. In the first part, we consider the task of real-time global illumination using photon density estimators. In particular we investigate scenarios where complex illumination effects, such as caustics, that can mainly be handled from the illumination theory regarding progressive photon mapping algorithms, require vast amount of rays to be traced from both the eye sensor and the light sources. Photons emanating from lights are cached into the surface geometry or volumetric media and must be gathered at query locations on the paths traced from the camera sensor. To achieve real-time frame rates, gathering, an expensive operation, needs to be efficiently handled. This is accomplished by adapting screen space ray tracing and splatting to the hardware-accelerated rasterisation pipeline. Since the gathering phase is an inherent subcategory of nearest neighbours search, we also propose how to efficiently generalise this concept to any form of task by exploiting existing low-level hardware accelerated ray tracing frameworks. Effectively boosting the inference phase by orders of magnitude compared to the traditional strategies involved. In the second part, we shift our focus to a more generic class of geometric queries. The first work involves accurate and fast shape classification using neural networks architectures. We demonstrate that a hybrid architecture, which processes orientation and a voxel-based representation of the input, is capable of processing hard-to-distinguish solid geometry from the context of building information models. Second, we consider the form of geometric queries in the context of scene understanding. More precisely, optimising the placement and light intensities of luminaries in urban places can be a computationally intricate task especially for large inputs and conflicting constraints. Methodologies employed in the literature usually make assumptions about the input representation to mitigate the intractable nature of this task. In this thesis, we approach this problem with a holistic solution that can produce feasible and diverse proposals in real time by adopting a neural-based generative modelling methodology. Finally, we propose a novel and general approach to solve recursive cost evaluators for the construction of geometric query acceleration data structures. This work establishes a new research direction for the construction of data structures guided by recursive cost functions using neural-based architectures. Our goal is to overcome the exhaustive but intractable evaluation of the cost function, in order to generate a high-quality data structure for spatial queries.
  • Item
    Perception-Based Techniques to Enhance User Experience in Virtual Reality
    (2024-07-26) Colin Groth
    Virtual reality (VR) ushered in a new era of immersive content viewing with vast potential for entertainment, design, medicine, and other fields. However, the willingness of users to practically apply the technology is bound to the quality of the virtual experience. In this dissertation, we describe the development and investigation of novel techniques to reduce negative influences on the user experience in VR applications. Our methods not only include substantial technical improvements but also consider important characteristics of human perception that are exploited to make the applications more effective and subtle. Mostly, we are focused on visual perception, since we deal with visual stimuli, but we also consider the vestibular sense which is a key component for the occurrence of negative symptoms in VR, referred to as cybersickness. In this dissertation, our techniques are designed for three groups of VR applications, characterized by the degree of freedom to apply adjustments. The first set of techniques addresses the extension of VR systems with stimulation hardware. By adjusting common techniques from the medical field, we artificially induce human body signals to create immersive experiences that reduce common mismatches between perceptual information. The second group focuses on applications that use common hardware and allow adjustments of the full render pipeline. Here, especially immersive video content is notable, where the frame rates and quality of the presentations are often not in line with the high requirements of VR systems to satisfy a decent user experience. To address the display problems, we present a novel video codec based on wavelet compression and perceptual features of the visual system. Finally, the third group of applications is the most restrictive and does not allow modifications of the rendering pipeline. Here, our techniques consist of post-processing manipulations in screen space after rendering the image, without knowledge of the 3D scene. To allow techniques in this group to be subtle, we exploit fundamental properties of human peripheral vision and apply spatial masking as well as gaze-contingent motion scaling in our methods.
  • Item
    Discrete Laplacians for General Polygonal and Polyhedral Meshes
    (TU Dortmund University, 2024) Astrid Pontzen (née Bunge)
    This thesis presents several approaches that generalize the Laplace-Beltrami operator and its closely related gradient and divergence operators to arbitrary polygonal and polyhedral meshes. We start by introducing the linear virtual refinement method, which provides a simple yet effective discretization of the Laplacian with the help of the Galerkin method from a Finite Element perspective. Its flexibility allows us to explore alternative numerical schemes in this setting and to derive a second Laplacian, called the Diamond Laplacian with a similar approach, but this time combined with the Discrete Duality Finite Volume method. It offers enhanced accuracy but comes at the cost of denser matrices and slightly longer solving times. In the second part of the thesis, we extend the linear virtual refinement to higher-order discretizations. This method is called the quadratic virtual refinement method. It introduces variational quadratic shape functions for arbitrary polygons and polyhedra. We also present a custom multigrid approach to address the computational challenges of higher-order discretizations, making the faster convergence rates and higher accuracy of these polygon shape functions more affordable for the user. The final part of this thesis focuses on the open degrees of freedom of the linear virtual refinement method. By uncovering connections between our operator and the underlying tessellations, we can enhance the accuracy and stability of our initial method and improve its overall performance. These connections equally allow us to define what a ``good'' polygon would be in the context of our Laplacian. We present a smoothing approach that alters the shape of the polygons (while retaining the original surface as much as possible) to allow for even better performance.
  • Item
    Learning Digital Humans from Vision and Language
    (ETH Zurich, 2024-10-10) Yao Feng
    The study of realistic digital humans has gained significant attention within the research communities of computer vision, computer graphics, and machine learning. This growing interest is driven by the importance of understanding human selves and the pivotal role digital humans play in diverse applications, including virtual presence in AR/VR, digital fashion, entertainment, robotics, and healthcare. However, two major challenges hinder the widespread use of digital humans across disciplines: the difficulty in capturing, as current methods rely on complex systems that are time-consuming, labor-intensive, and costly; and the lack of understanding, where even after creating digital humans, gaps in understanding their 3D representations and integrating them with broader world knowledge limit their effective utilization. Overcoming these challenges is crucial to unlocking the full potential of digital humans in interdisciplinary research and practical applications. To address these challenges, this thesis combines insights from computer vision, computer graphics, and machine learning to \textbf{develop scalable methods for capturing and modeling digital humans}. These methods include capturing faces, bodies, hands, hair, and clothing using accessible data such as images, videos, and text descriptions. More importantly, \textbf{we go beyond capturing to shift the research paradigm toward understanding and reasoning} by leveraging large language models (LLMs). For instance, we developed the first foundation model that not only captures 3D human poses from a single image, but also reasons about a person’s potential next actions in 3D by incorporating world knowledge. This thesis unifies scalable capturing and understanding of digital humans, from vision and language data—just as humans do by observing and interpreting the world through visual and linguistic information. Our research begins by developing a framework to capture detailed 3D faces from in-the-wild images. This framework, capable of generating highly realistic and animatable 3D faces from single images, is trained without paired 3D supervision and achieves state-of-the-art accuracy in shape reconstruction. It effectively disentangles identity and expression details, thereby allowing animation of estimated faces with various expressions. Humans, are not just faces, we then develop PIXIE, a method for estimating animatable, whole-body 3D avatars with realistic facial details from a single image. By incorporating an attention mechanism, PIXIE surpasses previous methods in accuracy and enables the creation of expressive, high-quality 3D humans. Expanding beyond human bodies, we proposed SCARF and DELTA, to capture separate body, clothing, face, and hair from monocular videos using a hybrid representation. While clothing and hair are better modeled with implicit representations like neural radiance fields (NeRFs) due to their complex topologies, human bodies are better represented with meshes. SCARF combines the strengths of both by integrating mesh-based bodies with NeRFs for clothing and hair. To enable learning directly from monocular videos, we introduced mesh-integrated volume rendering, which enables optimizing the model directly from 2D image data without requiring 3D supervision. Thanks to the disentangled modeling, the captured avatar's clothing can be transferred to arbitrary body shapes, making it especially valuable for applications such as virtual try-on. Building on SCARF's hybrid representation, we introduced TECA, which uses text-to-image generation models to create realistic and editable 3D avatars. TECA produces more realistic avatars than recent methods while allowing edits due to its compositional design. For instance, users can input descriptions like ``a slim woman with dreadlocks'' to generate a 3D head mesh with texture and a NeRF model for the hair. It also enables transferring NeRF-based hairstyles, scarves, and other accessories between avatars. While these methods make capturing humans more accessible, broader applications require understanding the context of human behavior. Traditional pose estimation methods often isolate subjects by cropping images, which limits their ability to interpret the full scene or reason about actions. To address this, we developed ChatPose, the first model for understanding and reasoning about 3D human poses. ChatPose leverages a multimodal large language model (LLM), finetuned a projection layer to decode embeddings into 3D pose parameters, which are further decoded into 3D body meshes using the SMPL body model. By finetuning on both text-to-3D pose and image-to-3D pose data, ChatPose demonstrates, for the first time, that a LLM can directly reason about 3D human poses. This capability allows ChatPose to describe human behavior, generate 3D poses, and reason about potential next actions in 3D form, combining perception with reasoning. We believe the contributions of this thesis, in scaling up digital human capture and advancing the understanding of humans in 3D, have the potential to shape the future of human-centered research and enable broader applications across diverse fields.