PG2024 Conference Papers and Posters
Permanent URI for this collection
Browse
Browsing PG2024 Conference Papers and Posters by Issue Date
Now showing 1 - 20 of 57
Results Per Page
Sort Options
Item PointJEM: Self-supervised Point Cloud Understanding for Reducing Feature Redundancy via Joint Entropy Maximization(The Eurographics Association, 2024) Cao, Xin; Xia, Huan; Wang, Haoyu; Su, Linzhi; Zhou, Ping; Li, Kang; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyMost deep learning methods for point cloud processing are supervised and require extensive labeled data. However, labeling point cloud data is a tedious and time-consuming task. Self-supervised representation learning can solve this problem by extracting robust and generalized features from unlabeled data. Yet, the features from representation learning are often redundant. Current methods typically reduce redundancy by imposing linear correlation constraints. In this paper, we introduce PointJEM, a self-supervised representation learning method for point clouds. It includes an embedding scheme that divides the vector into parts, each learning a unique feature. To minimize redundancy, PointJEM maximizes joint entropy between parts, making the features pairwise independent. We tested PointJEM on various datasets and found it significantly reduces redundancy beyond linear correlation. Additionally, PointJEM performs well in downstream tasks like classification and segmentation.Item High-Quality Geometry and Texture Editing of Neural Radiance Field(The Eurographics Association, 2024) Kim, Soongjin; Son, Jooeun; Ju, Gwangjin; Lee, Joo Ho; Lee, Seungyong; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyRecent advances in Neural Radiance Field (NeRF) have demonstrated impressive rendering quality reconstructed from input images. However, the density-based radiance field representation introduces entanglement of geometry and texture, limiting the editability. To address this issue, NeuMesh proposed a mesh-based NeRF editing method supporting deformation and texture editing. Still, it fails reconstructing and rendering fine details of input images, and the dependency between rendering scheme and geometry limits editability for target scenes. In this paper, we propose an intermediate scene representation where a near-surface volume is associated with the guide mesh. Our key idea is separating a given scene into geometry, parameterized texture space, and radiance field. We define a mapping between GHI-coordinate space and DE-coordinate system defined by combination of mesh parameterization and the height from mesh surface to efficiently encode the near-surface volume. With the surface-aligned radiance field defined in the near-surface volume, our method can generate high quality rendering results with high frequency details. Our method also supports various geometry and appearance editing operations while preserving high rendering quality. We demonstrate the performance of our method by comparing it with the state-of-the-art methods both qualitatively and quantitatively and show its applications including shape deformation, texture filling, and texture painting.Item Free-form Floor Plan Design using Differentiable Voronoi Diagram(The Eurographics Association, 2024) Wu, Xuanyu; Tojo, Kenji; Umetani, Nobuyuki; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyDesigning floor plans is difficult because various constraints must be satisfied by the layouts of the internal walls. This paper presents a novel shape representation and optimization method for designing floor plans based on the Voronoi diagrams. Our Voronoi diagram implicitly specifies the shape of the room using the distance from the Voronoi sites, thus facilitating the topological changes in the wall layout by moving these sites. Since the differentiation of the explicit wall representation is readily available, our method can incorporate various constraints, such as room areas and room connectivity, into the optimization. We demonstrate that our method can generate various floor plans while allowing users to interactively change the constraints.Item StegaVideo: Robust High-Resolution Video Steganography with Temporal and Edge Guidance(The Eurographics Association, 2024) Hu, Kun; Hu, Zixuan; Zhu, Qianhui; Wang, Xiaochao; Wang, Xingjun; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyCurrent video steganography frameworks have difficulties in balancing robustness and imperceptibility at high resolution. To achieve better video coherence, robustness, and invisibility, we propose an efficient high-resolution video steganography method, named StegaVideo, that utilizes temporal guidance and edge guidance techniques. StegaVideo particularly focuses on concentrating the embedding message in the edge region to enhance invisibility, achieving a Peak Signal to Noise Ratio (PSNR) value of over 38 dB. We simulate various attacks to enhance robustness, with an average bit accuracy of above 99.5%. We use a faster embedding and extracting network, resulting in a 10× improvement in inference speed. Our method outperforms current leading video steganography systems in terms of efficiency, robustness, resolution, and inference speed, as demonstrated by the experiment. Our code will be publicly available at https://github.com/LittleFocus2201/StegaVideo.Item Single Image 3D Reconstruction of Creased Documents Using Shape-from-Shading with Template-Based Error Correction(The Eurographics Association, 2024) Wang, Linqin; Bo, Pengbo; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyWe present a method for reconstructing 3D models from single images of creased documents by enhancing the linear shapefrom- shading (SFS) technique with a template-based error correction mechanism. This mechanism is based on a mapping function established using precise data from a spherical surface modeled with linearized Lambertian shading. The error correction mapping is integrated into an algorithm that refines reconstructed depth values during the image scanning process. To resolve the inherent concave/convex ambiguities in SFS, we identify specific conditions based on assumed lighting and the geometric characteristics of creased documents, effectively improving reconstruction even in less controlled lighting environments. Our approach captures intricate geometric details on non-smooth surfaces. Comparative results demonstrate that our method provides superior accuracy and efficiency in reconstructing complex features such as creases and wrinkles.Item Computational Mis-Drape Detection and Rectification(The Eurographics Association, 2024) Shin, Hyeon-Seung; Ko, Hyeong-Seok; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyFor various reasons, mis-drapes occur in physically-based clothing simulation. Therefore, when developing a virtual try-on system that works without any human operators, a technique to algorithmically detect and rectify mis-drapes has to be developed. This paper makes a first attempt in that direction, by defining two mis-drape determinants, namely, the Gaussian and crease mis-drape determinants. According to the experiments performed to various avatar-garment combinations, the proposed determinants identify mis-drapes pretty accurately. This paper also proposes a treatment that can be applied to rectify the mis-drapes. The proposed treatment successfully resolves the mis-drapes without unnecessarily destroying the original drape.Item Audio-Driven Speech Animation with Text-Guided Expression(The Eurographics Association, 2024) Jung, Sunjin; Chun, Sewhan; Noh, Junyong; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyWe introduce a novel method for generating expressive speech animations of a 3D face, driven by both audio and text descriptions. Many previous approaches focused on generating facial expressions using pre-defined emotion categories. In contrast, our method is capable of generating facial expressions from text descriptions unseen during training, without limitations to specific emotion classes. Our system employs a two-stage approach. In the first stage, an auto-encoder is trained to disentangle content and expression features from facial animations. In the second stage, two transformer-based networks predict the content and expression features from audio and text inputs, respectively. These features are then passed to the decoder of the pre-trained auto-encoder, yielding the final expressive speech animation. By accommodating diverse forms of natural language, such as emotion words or detailed facial expression descriptions, our method offers an intuitive and versatile way to generate expressive speech animations. Extensive quantitative and qualitative evaluations, including a user study, demonstrate that our method can produce natural expressive speech animations that correspond to the input audio and text descriptions.Item Fast Wavelet-domain Smoke Guiding(The Eurographics Association, 2024) Lyu, Luan; Ren, Xiaohua; Wu, Enhua; Yang, Zhi-Xin; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyWe propose a simple and efficient wavelet-based method to guide smoke simulation with specific velocity fields. This method primarily uses wavelets to combine low-resolution velocities with high-resolution details for smoke guiding. Due to the natural ability of wavelets to divide data into different frequency bands, we can merge low and high-resolution velocities by replacing wavelet coefficients. Compared to Fourier methods, the wavelet transform can use wavelets with shorter, compact supports, making the transformation faster and more adaptable to various boundary conditions. The method has a time complexity of O(n) and a memory complexity of n. Additionally, wavelets are compactly supported, which allows us to locally filter out or retain details by editing the wavelet coefficients. This enables us to locally edit smoke. Moreover, to accelerate the performance of wavelet transforms on GPUs, we propose a technique implemented in CUDA called in-kernel warp-level wavelet transform computation. This technique utilizes warp-level CUDA intrinsic functions to reduce data read times during computations, thus enhancing the efficiency of the wavelet transform. The experiments demonstrate that our proposed wavelet-based method achieves an approximate 5x speedup in 3D on GPUs compared to the Fourier methods, resulting in an overall improvement of around 40% in the smoke-guided simulation.Item ``Yunluo Journey'': A VR Cultural experience for the Chinese Musical Instrument(The Eurographics Association, 2024) Wang, Yuqiu; Guo, Wenchen; He, Zhiting; Fan, Min; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyThe sustainability of the cultural heritage of traditional musical instruments requires integrating musical culture into people's daily lives. However, the Yunluo, a traditional Chinese musical instrument, is too large and expensive to be easily incorporated into everyday life. To promote the sustainability and dissemination of Yunluo culture, we designed a VR Yunluo cultural experience that allows people to engage in the creation and performance of Yunluo, as well as learn about its historical and cultural significance through a Yunluo experience. This embodied, gamified, and contextualized VR experience aims to enhance participants' interest in Yunluo culture and improve their understanding and appreciation of the related knowledge.Item Enhancing Human Optical Flow via 3D Spectral Prior(The Eurographics Association, 2024) Mao, Shiwei; Sun, Mingze; Huang, Ruqi; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyIn this paper, we consider the problem of human optical flow estimation, which is critical in a series of human-centric computer vision tasks. Recent deep learning-based optical flow models have achieved considerable accuracy and generalization by incorporating various kinds of priors. However, the majority either rely on large-scale 2D annotations or rigid priors, overlooking the 3D non-rigid nature of human articulations. To this end, we advocate enhancing human optical flow estimation via 3D spectral prior-aware pretraining, which is based on the well-known functional maps formulation in 3D shape matching. Our pretraining can be performed with synthetic human shapes. More specifically, we first render shapes to images and then leverage the natural inclusion maps from images to shapes to lift 2D optical flow into 3D correspondences, which are further encoded as functional maps. Such lifting operation allows to inject the intrinsic geometric features encoded in the spectral representations into optical flow learning, leading to improvement of the latter, especially in the presence of non-rigid deformations. In practice, we establish a pretraining pipeline tailored for triangular meshes, which is general regarding target optical flow network. It is worth noting that it does not introduce any additional learning parameters but only require some pre-computed eigen decomposition on the meshes. For RAFT and GMA, our pretraining task achieves improvements of 12.8% and 4.9% in AEPE on the SHOF benchmark, respectively.Item Dense Crowd Motion Prediction through Density and Trend Maps(The Eurographics Association, 2024) Wang, Tingting; Fu, Qiang; Wang, Minggang; Bi, Huikun; Deng, Qixin; Deng, Zhigang; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyIn this paper we propose a novel density/trend map based method to predict both group behavior and individual pedestrian motion from video input. Existing motion prediction methods represent pedestrian motion as a set of spatial-temporal trajectories; however, besides such a per-pedestrian representation, a high-level representation for crowd motion is often needed in many crowd applications. Our method leverages density maps and trend maps to represent the spatial-temporal states of dense crowds. Based on such representations, we propose a crowd density map net that extracts a density map from a video clip, and a crowd prediction net that utilizes the historical states of a video clip to predict density maps and trend maps for future frames. Moreover, since the crowd motion consists of the motion of individual pedestrians in a group, we also leverage the predicted crowd motion as a clue to improve the accuracy of traditional trajectory-based motion prediction methods. Through a series of experiments and comparisons with state-of-the-art motion prediction methods, we demonstrate the effectiveness and robustness of our method.Item GGAvatar: Dynamic Facial Geometric Adjustment for Gaussian Head Avatar(The Eurographics Association, 2024) Li, Xinyang; Wang, Jiaxin; Xuan, Yixin; Yao, Gongxin; Pan, Yu; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyReconstructing animatable 3D head avatars from target subject videos has long been a significant challenge and a hot topic in computer graphics. This paper proposes GGAvatar, a novel 3D avatar representation designed to robustly model dynamic head avatars with complex identities and deformations. GGAvatar employs a coarse-to-fine structure, featuring two core modules: a Neutral Gaussian Initialization Module and a Geometry Morph Adjuster. The Neutral Gaussian Initialization Module pairs Gaussian primitives with deformable triangular meshes, using an adaptive density control strategy to model the geometric structure of the target subject with neutral expressions. The Geometry Morph Adjuster introduces deformation bases for each Gaussian in global space, creating fine-grained low-dimensional representations of deformations to overcome the limitations of the Linear Blend Skinning formula. Extensive experiments show that GGAvatar can produce high-fidelity renderings, outperforming state-of-the-art methods in visual quality and quantitative metrics.Item Self-Supervised Multi-Layer Garment Animation Generation Network(The Eurographics Association, 2024) Han, Guoqing; Shi, Min; Mao, Tianlu; Wang, Xinran; Zhu, Dengming; Gao, Lin; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyThis paper presents a self-supervised multi-layer garment animation generation network. The complexity inherent in multi-layer garments, particularly the diverse interactions between layers, poses challenges in generating continuous, stable, physically accurate, and visually realistic garment deformation animations. To tackle these challenges, we present the Self-Supervised Multi-Layer Garment Animation Generation Network (SMLN). The architecture of SMLN is based on graph neural networks, which represents garment models uniformly as graph structures, thereby naturally depicting the hierarchical structure of garments and capturing the relationships between garment layers. Unlike existing multi-layer garment deformation methods, we model interaction forces such as friction and repulsion between garment layers, translating physical laws consistent with dynamics into network constraints. We penalize garment deformation regions that exceed these constraints. Furthermore, instead of the traditional post-processing method of fixed vertex displacement calculation for handling collision interactions, we add an additional repulsion constraint layer within the network to update the corresponding repulsive force acceleration, thereby adaptively managing collisions between garment layers. Our self-supervised modeling approach enables the network to learn without relying on garment sample datasets. Experimental results demonstrate that our method is capable of generating visually plausible multi-layer garment deformation effects, surpassing existing methods in both visual quality and evaluation metrics.Item 3DStyleGLIP: Part-Tailored Text-Guided 3D Neural Stylization(The Eurographics Association, 2024) Chung, SeungJeh; Park, JooHyun; Kang, HyeongYeop; Chen, Renjie; Ritschel, Tobias; Whiting, Emily3D stylization, the application of specific styles to three-dimensional objects, offers substantial commercial potential by enabling the creation of uniquely styled 3D objects tailored to diverse scenes. Recent advancements in artificial intelligence and textdriven manipulation methods have made the stylization process increasingly intuitive and automated. While these methods reduce human costs by minimizing reliance on manual labor and expertise, they predominantly focus on holistic stylization, neglecting the application of desired styles to individual components of a 3D object. This limitation restricts the fine-grained controllability. To address this gap, we introduce 3DStyleGLIP, a novel framework specifically designed for text-driven, parttailored 3D stylization. Given a 3D mesh and a text prompt, 3DStyleGLIP utilizes the vision-language embedding space of the Grounded Language-Image Pre-training (GLIP) model to localize individual parts of the 3D mesh and modify their appearance to match the styles specified in the text prompt. 3DStyleGLIP effectively integrates part localization and stylization guidance within GLIP's shared embedding space through an end-to-end process, enabled by part-level style loss and two complementary learning techniques. This neural methodology meets the user's need for fine-grained style editing and delivers high-quality part-specific stylization results, opening new possibilities for customization and flexibility in 3D content creation. Our code and results are available at https://github.com/sj978/3DStyleGLIP.Item Self-Supervised Multi-Layer Garment Animation Generation Network(The Eurographics Association, 2024) Han, Guoqing; Shi, Min; Mao, Tianlu; Wang, Xinran; Zhu, Dengming; Gao, Lin; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyThis paper presents a self-supervised multi-layer garment animation generation network. The complexity inherent in multi-layer garments, particularly the diverse interactions between layers, poses challenges in generating continuous, stable, physically accurate, and visually realistic garment deformation animations. To tackle these challenges, we present the Self-Supervised Multi-Layer Garment Animation Generation Network (SMLN). The architecture of SMLN is based on graph neural networks, which represents garment models uniformly as graph structures, thereby naturally depicting the hierarchical structure of garments and capturing the relationships between garment layers. Unlike existing multi-layer garment deformation methods, we model interaction forces such as friction and repulsion between garment layers, translating physical laws consistent with dynamics into network constraints. We penalize garment deformation regions that exceed these constraints. Furthermore, instead of the traditional post-processing method of fixed vertex displacement calculation for handling collision interactions, we add an additional repulsion constraint layer within the network to update the corresponding repulsive force acceleration, thereby adaptively managing collisions between garment layers. Our self-supervised modeling approach enables the network to learn without relying on garment sample datasets. Experimental results demonstrate that our method is capable of generating visually plausible multi-layer garment deformation effects, surpassing existing methods in both visual quality and evaluation metrics.Item Biophysically-based Simulation of Sun-induced Skin Appearance Changes(The Eurographics Association, 2024) He, Xueyan; Huang, Minghao; Fu, Ruoyu; Guo, Jie; Yuan, Junping; Wang, Yanghai; Guo, Yanwen; Chen, Renjie; Ritschel, Tobias; Whiting, EmilySkin appearance modeling plays a crucial role in various fields such as healthcare, cosmetics and entertainment. However, the structure of the skin and its interaction with environmental factors like ultraviolet radiation are very complex and require further detailed modeling. In this paper, we propose a biophysically-based model to illustrate the changes in skin appearance under ultraviolet radiation exposure. It takes ultraviolet doses and specific biophysical parameters as inputs, leading to variations in melanin and blood concentrations, as well as the growth rate of skin cells. These changes bring alteration of light scattering, which is simulated by random walk method, and result in observable erythema and tanning. We showcase effects of various skin tones, comparisons across different body parts, and images illustrating the impact of occlusion. It demonstrates superior quality to the the commonly used method with more convincing skin details and bridges biological insights with visual simulations.Item Simulating Viscous Fluid Using Free Surface Lattice Boltzmann Method(The Eurographics Association, 2024) Sun, Dakun; Gao, Yang; Xie, Xueguang; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyHigh viscosity fluid simulation remains a significant area of interest within the graphics field. However, there are few discussions about simulating viscous fluids in computer graphics with the Lattice Boltzmann Method (LBM). In this study, we demonstrate the feasibility of using LBM for viscous fluid simulation and show a caveat regarding external forces. Previous methods (such as FLIP, MPM, SPH) on viscous fluids are mainly based on Navier-Stokes (NS) Equation, where the external forces are independent from viscosity in governing equation. Therefore, the decision to neglect the external force solely depends on its magnitude. However, in the context of the Lattice Boltzmann Equation (LBE), external forces are intertwined with viscosity within the collision term, making the choice to ignore the external force term dependent on both the viscosity and the force's magnitude. It has not been mentioned in previous study and we will show its importance by comparison experiments.Item Mesh Slicing Along Isolines of Surface-Based Functions(The Eurographics Association, 2024) Wang, Lei; Wang, Xudong; Wang, Wensong; Chen, Shuangmin; Xin, Shiqing; Tu, Changhe; Wang, Wenping; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyThere are numerous practical scenarios where the surface of a 3D object is equipped with varying properties. The process of slicing the surface along the isoline of the property field is a widely utilized operation. While the geometry of the 3D object can typically be approximated with a piecewise linear triangle mesh, the property field f might be too intricate to be linearly approximated at the same resolution. Arbitrarily reducing the isoline within a triangle into a straight-line segment could result in noticeable artifacts. In this paper, we delve into the precise extraction of the isoline of a surface-based function f for slicing the surface apart, allowing the extracted isoline to be curved within a triangle. Our approach begins by adequately sampling Steiner points on mesh edges. Subsequently, for each triangle, we categorize the Steiner points into two groups based on the signs of their function values. We then trace the bisector between these two groups of Steiner points by simply computing a 2D power diagram of all Steiner points. It's worth noting that the weight setting of the power diagram is derived from the first-order approximation of f . Finally, we refine the polygonal bisector by adjusting each vertex to the closest point on the actual isoline. Each step of our algorithm is fully parallelizable on a triangle level, making it highly efficient. Additionally, we provide numerous examples to illustrate its practical applications.Item DViTGAN: Training ViTGANs with Diffusion(The Eurographics Association, 2024) Tong, Mengjun; Rao, Hong; Yang, Wenji; Chen, Shengbo; Zuo, Fang; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyRecent research findings indicate that injecting noise using diffusion can effectively improve the stability of GAN for image generation tasks. Although ViTGAN based on Vision Transformer has certain performance advantages compared to traditional GAN, there are still issues such as unstable training and generated image details are not rich enough. Therefore, in this paper, we propose a novel model, DViTGAN, which leverages the diffusion model to generate instance noise facilitating ViTGAN training. Specifically, we employ forward diffusion to progressively generate noise that follows a Gaussian mixture distribution, and then introduce the generated noise into the input image of the discriminator. The generator incorporates the discriminator's feedback by backpropagating through the forward diffusion process to improve its performance. In addition, we observe that the ViTGAN generator lacks positional information, leading to a decreased context modeling ability and slower convergence. To this end, we introduce Fourier embedding and relative positional encoding to enhance the model's expressive ability. Experiments on multiple popular benchmarks have demonstrated the effectiveness of our proposed model.Item Semantics-Augmented Quantization-Aware Training for Point Cloud Classification(The Eurographics Association, 2024) Huang, Liming; Qin, Yunchuan; Li, Ruihui; Wu, Fan; Li, Kenli; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyPoint cloud classification is a pivotal procedure in 3D computer vision, and its deployment in practical applications is often constrained by limited computational and memory resources. To address these issues, we introduce a Semantics-Augmented Quantization-Aware Training (SAQAT) framework designed for efficient and precise classification of point cloud data. The SAQAT framework incorporates a point importance prediction semantic module as a side output, which assists in identifying crucial points, along with a point importance evaluation algorithm (PIEA). The semantics module leverages point importance prediction to skillfully select quantization levels based on local geometric properties and semantic context. This approach reduces errors by retaining essential information. In synergy, the PIEA acts as the cornerstone, providing an additional layer of refinement to SAQAT framework. Furthermore, we integrates a loss function that mitigates classification loss, quantization error, and point importance prediction loss, thereby fostering a reliable representation of the quantized data. The SAQAT framework is designed for seamless integration with existing point cloud models, enhancing their efficiency while maintaining high levels of accuracy. Testing on benchmark datasets demonstrates that our SAQAT framework surpasses contemporary quantization methods in classification accuracy while simultaneously economizing on memory and computational resources. Given these advantages, our SAQAT framework holds enormous potential for a wide spectrum of applications within the rapidly evolving domain of 3D computer vision. Our code is released: https://github.com/h-liming/SAQAT.
- «
- 1 (current)
- 2
- 3
- »