Browsing by Author "Beeler, Thabo"
Now showing 1 - 10 of 10
Results Per Page
Sort Options
- Item Accurate Real-time 3D Gaze Tracking Using a Lightweight Eyeball Calibration(The Eurographics Association and John Wiley & Sons Ltd., 2020) Wen, Quan; Bradley, Derek; Beeler, Thabo; Park, Seonwook; Hilliges, Otmar; Yong, Jun-Hai; Xu, Feng; Panozzo, Daniele and Assarsson, Ulf3D gaze tracking from a single RGB camera is very challenging due to the lack of information in determining the accurate gaze target from a monocular RGB sequence. The eyes tend to occupy only a small portion of the video, and even small errors in estimated eye orientations can lead to very large errors in the triangulated gaze target. We overcome these difficulties with a novel lightweight eyeball calibration scheme that determines the user-specific visual axis, eyeball size and position in the head. Unlike the previous calibration techniques, we do not need the ground truth positions of the gaze points. In the online stage, gaze is tracked by a new gaze fitting algorithm, and refined by a 3D gaze regression method to correct for bias errors. Our regression is pre-trained on several individuals and works well for novel users. After the lightweight one-time user calibration, our method operates in real time. Experiments show that our technique achieves state-of-the-art accuracy in gaze angle estimation, and we demonstrate applications of 3D gaze target tracking and gaze retargeting to an animated 3D character.
- Item Facial Expression Synthesis using a Global-Local Multilinear Framework(The Eurographics Association and John Wiley & Sons Ltd., 2020) Wang, Mengjiao; Bradley, Derek; Zafeiriou, Stefanos; Beeler, Thabo; Panozzo, Daniele and Assarsson, UlfWe present a practical method to synthesize plausible 3D facial expressions for a particular target subject. The ability to synthesize an entire facial rig from a single neutral expression has a large range of applications both in computer graphics and computer vision, ranging from the efficient and cost-effective creation of CG characters to scalable data generation for machine learning purposes. Unlike previous methods based on multilinear models, the proposed approach is capable to extrapolate well outside the sample pool, which allows it to plausibly predict the identity of the target subject and create artifact free expression shapes while requiring only a small input dataset. We introduce global-local multilinear models that leverage the strengths of expression-specific and identity-specific local models combined with coarse motion estimations from a global model. Experimental results show that we achieve high-quality, plausible facial expression synthesis results for an individual that outperform existing methods both quantitatively and qualitatively.
- Item Fast Nonlinear Least Squares Optimization of Large-Scale Semi-Sparse Problems(The Eurographics Association and John Wiley & Sons Ltd., 2020) Fratarcangeli, Marco; Bradley, Derek; Gruber, Aurel; Zoss, Gaspard; Beeler, Thabo; Panozzo, Daniele and Assarsson, UlfMany problems in computer graphics and vision can be formulated as a nonlinear least squares optimization problem, for which numerous off-the-shelf solvers are readily available. Depending on the structure of the problem, however, existing solvers may be more or less suitable, and in some cases the solution comes at the cost of lengthy convergence times. One such case is semi-sparse optimization problems, emerging for example in localized facial performance reconstruction, where the nonlinear least squares problem can be composed of hundreds of thousands of cost functions, each one involving many of the optimization parameters. While such problems can be solved with existing solvers, the computation time can severely hinder the applicability of these methods. We introduce a novel iterative solver for nonlinear least squares optimization of large-scale semi-sparse problems. We use the nonlinear Levenberg-Marquardt method to locally linearize the problem in parallel, based on its firstorder approximation. Then, we decompose the linear problem in small blocks, using the local Schur complement, leading to a more compact linear system without loss of information. The resulting system is dense but its size is small enough to be solved using a parallel direct method in a short amount of time. The main benefit we get by using such an approach is that the overall optimization process is entirely parallel and scalable, making it suitable to be mapped onto graphics hardware (GPU). By using our minimizer, results are obtained up to one order of magnitude faster than other existing solvers, without sacrificing the generality and the accuracy of the model. We provide a detailed analysis of our approach and validate our results with the application of performance-based facial capture using a recently-proposed anatomical local face deformation model.
- Item Frontmatter: ACM SIGGRAPH / Eurographics Symposium of Computer Animation 2018(The Eurographics Association and John Wiley & Sons Ltd., 2018) Thuerey, Nils; Beeler, Thabo; Thuerey, Nils and Beeler, Thabo
- Item GANtlitz: Ultra High Resolution Generative Model for Multi-Modal Face Textures(The Eurographics Association and John Wiley & Sons Ltd., 2024) Gruber, Aurel; Collins, Edo; Meka, Abhimitra; Mueller, Franziska; Sarkar, Kripasindhu; Orts-Escolano, Sergio; Prasso, Luca; Busch, Jay; Gross, Markus; Beeler, Thabo; Bermano, Amit H.; Kalogerakis, EvangelosHigh-resolution texture maps are essential to render photoreal digital humans for visual effects or to generate data for machine learning. The acquisition of high resolution assets at scale is cumbersome, it involves enrolling a large number of human subjects, using expensive multi-view camera setups, and significant manual artistic effort to align the textures. To alleviate these problems, we introduce GANtlitz (A play on the german noun Antlitz, meaning face), a generative model that can synthesize multi-modal ultra-high-resolution face appearance maps for novel identities. Our method solves three distinct challenges: 1) unavailability of a very large data corpus generally required for training generative models, 2) memory and computational limitations of training a GAN at ultra-high resolutions, and 3) consistency of appearance features such as skin color, pores and wrinkles in high-resolution textures across different modalities. We introduce dual-style blocks, an extension to the style blocks of the StyleGAN2 architecture, which improve multi-modal synthesis. Our patch-based architecture is trained only on image patches obtained from a small set of face textures (<100) and yet allows us to generate seamless appearance maps of novel identities at 6kĂ—4k resolution. Extensive qualitative and quantitative evaluations and baseline comparisons show the efficacy of our proposed system.
- Item Interactive Sculpting of Digital Faces Using an Anatomical Modeling Paradigm(The Eurographics Association and John Wiley & Sons Ltd., 2020) Gruber, Aurel; Fratarcangeli, Marco; Zoss, Gaspard; Cattaneo, Roman; Beeler, Thabo; Gross, Markus; Bradley, Derek; Jacobson, Alec and Huang, QixingDigitally sculpting 3D human faces is a very challenging task. It typically requires either 1) highly-skilled artists using complex software packages for high quality results, or 2) highly-constrained simple interfaces for consumer-level avatar creation, such as in game engines. We propose a novel interactive method for the creation of digital faces that is simple and intuitive to use, even for novice users, while consistently producing plausible 3D face geometry, and allowing editing freedom beyond traditional video game avatar creation. At the core of our system lies a specialized anatomical local face model (ALM), which is constructed from a dataset of several hundred 3D face scans. User edits are propagated to constraints for an optimization of our data-driven ALM model, ensuring the resulting face remains plausible even for simple edits like clicking and dragging surface points. We show how several natural interaction methods can be implemented in our framework, including direct control of the surface, indirect control of semantic features like age, ethnicity, gender, and BMI, as well as indirect control through manipulating the underlying bony structures. The result is a simple new method for creating digital human faces, for artists and novice users alike. Our method is attractive for low-budget VFX and animation productions, and our anatomical modeling paradigm can complement traditional game engine avatar design packages.
- Item Learning to Stabilize Faces(The Eurographics Association and John Wiley & Sons Ltd., 2024) Bednarik, Jan; Wood, Erroll; Choutas, Vassilis; Bolkart, Timo; Wang, Daoye; Wu, Chenglei; Beeler, Thabo; Bermano, Amit H.; Kalogerakis, EvangelosNowadays, it is possible to scan faces and automatically register them with high quality. However, the resulting face meshes often need further processing: we need to stabilize them to remove unwanted head movement. Stabilization is important for tasks like game development or movie making which require facial expressions to be cleanly separated from rigid head motion. Since manual stabilization is labor-intensive, there have been attempts to automate it. However, previous methods remain impractical: they either still require some manual input, produce imprecise alignments, rely on dubious heuristics and slow optimization, or assume a temporally ordered input. Instead, we present a new learning-based approach that is simple and fully automatic. We treat stabilization as a regression problem: given two face meshes, our network directly predicts the rigid transform between them that brings their skulls into alignment. We generate synthetic training data using a 3D Morphable Model (3DMM), exploiting the fact that 3DMM parameters separate skull motion from facial skin motion. Through extensive experiments we show that our approach outperforms the state-of-the-art both quantitatively and qualitatively on the tasks of stabilizing discrete sets of facial expressions as well as dynamic facial performances. Furthermore, we provide an ablation study detailing the design choices and best practices to help others adopt our approach for their own uses.
- Item Pixels2Points: Fusing 2D and 3D Features for Facial Skin Segmentation(The Eurographics Association, 2025) Chen, Victoria Yue; Wang, Daoye; Garbin, Stephan; Bednarik, Jan; Winberg, Sebastian; Bolkart, Timo; Beeler, Thabo; Ceylan, Duygu; Li, Tzu-MaoFace registration deforms a template mesh to closely fit a 3D face scan, the quality of which commonly degrades in non-skin regions (e.g., hair, beard, accessories), because the optimized template-to-scan distance pulls the template mesh towards the noisy scan surface. Improving registration quality requires a clean separation of skin and non-skin regions on the scan mesh. Existing image-based (2D) or scan-based (3D) segmentation methods however perform poorly. Image-based segmentation outputs multi-view inconsistent masks, and they cannot account for scan inaccuracies or scan-image misalignment, while scan-based methods suffer from lower spatial resolution compared to images. In this work, we introduce a novel method that accurately separates skin from non-skin geometry on 3D human head scans. For this, our method extracts features from multi-view images using a frozen image foundation model and aggregates these features in 3D. These lifted 2D features are then fused with 3D geometric features extracted from the scan mesh, to then predict a segmentation mask directly on the scan mesh. We show that our segmentations improve the registration accuracy over pure 2D or 3D segmentation methods by 8.89% and 14.3%, respectively. Although trained only on synthetic data, our model generalizes well to real data.
- Item Practical Person-Specific Eye Rigging(The Eurographics Association and John Wiley & Sons Ltd., 2019) Bérard, Pascal; Bradley, Derek; Gross, Markus; Beeler, Thabo; Alliez, Pierre and Pellacini, FabioWe present a novel parametric eye rig for eye animation, including a new multi-view imaging system that can reconstruct eye poses at submillimeter accuracy to which we fit our new rig. This allows us to accurately estimate person-specific eyeball shape, rotation center, interocular distance, visual axis, and other rig parameters resulting in an animation-ready eye rig. We demonstrate the importance of several aspects of eye modeling that are often overlooked, for example that the visual axis is not identical to the optical axis, that it is important to model rotation about the optical axis, and that the rotation center of the eye should be measured accurately for each person. Since accurate rig fitting requires hand annotation of multi-view imagery for several eye gazes, we additionally propose a more user-friendly ''lightweight'' fitting approach, which leverages an average rig created from several pre-captured accurate rigs. Our lightweight rig fitting method allows for the estimation of eyeball shape and eyeball position given only a single pose with a known look-at point (e.g. looking into a camera) and few manual annotations.
- Item ShellNeRF: Learning a Controllable High-resolution Model of the Eye and Periocular Region(The Eurographics Association and John Wiley & Sons Ltd., 2024) Li, Gengyan; Sarkar, Kripasindhu; Meka, Abhimitra; Buehler, Marcel; Mueller, Franziska; Gotardo, Paulo; Hilliges, Otmar; Beeler, Thabo; Bermano, Amit H.; Kalogerakis, EvangelosEye gaze and expressions are crucial non-verbal signals in face-to-face communication. Visual effects and telepresence demand significant improvements in personalized tracking, animation, and synthesis of the eye region to achieve true immersion. Morphable face models, in combination with coordinate-based neural volumetric representations, show promise in solving the difficult problem of reconstructing intricate geometry (eyelashes) and synthesizing photorealistic appearance variations (wrinkles and specularities) of eye performances. We propose a novel hybrid representation - ShellNeRF - that builds a discretized volume around a 3DMM face mesh using concentric surfaces to model the deformable 'periocular' region. We define a canonical space using the UV layout of the shells that constrains the space of dense correspondence search. Combined with an explicit eyeball mesh for modeling corneal light-transport, our model allows for animatable photorealistic 3D synthesis of the whole eye region. Using multi-view video input, we demonstrate significant improvements over state-of-the-art in expression re-enactment and transfer for high-resolution close-up views of the eye region.