PG2024 Conference Papers and Posters
Permanent URI for this collection
Browse
Browsing PG2024 Conference Papers and Posters by Issue Date
Now showing 1 - 20 of 57
Results Per Page
Sort Options
Item PhysHand: A Hand Simulation Model with Physiological Geometry, Physical Deformation, and Accurate Contact Handling(The Eurographics Association, 2024) Sun, Mingyang; Kou, Dongliang; Yuan, Ruisheng; Yang, Dingkang; Zhai, Peng; Zhao, Xiao; Jiang, Yang; Li, Xiong; Li, Jingchen; Zhang, Lihua; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyIn virtual Hand-Object Interaction (HOI) scenarios, the authenticity of the hand's deformation is important to immersive experience, such as natural manipulation or tactile feedback. Unrealistic deformation arises from simplified hand geometry, neglect of the different physics attributes of the hand, and penetration due to imprecise contact handling. To address these problems, we propose PhysHand, a novel hand simulation model, which enhances the realism of deformation in HOI. First, we construct a physiologically plausible geometry, a layered mesh with a ''skin-flesh-skeleton'' structure. Second, to satisfy the distinct physics features of different soft tissues, a constraint-based dynamics framework is adopted with carefully designed layer-corresponding constraints to maintain flesh attached and skin smooth. Finally, we employ an SDF-based method to eliminate the penetration caused by contacts and enhance its accuracy by introducing a novel multi-resolution querying strategy. Extensive experiments have been conducted to demonstrate the outstanding performance of PhysHand in calculating deformations and handling contacts. Compared to existing methods, our PhysHand: 1) can compute both physiologically and physically plausible deformation; 2) significantly reduces the depth and count of penetration in HOI.Item Modeling Sketches both Semantically and Structurally for Zero-Shot Sketch-Based Image Retrieval is Better(The Eurographics Association, 2024) Jing, Jiansen; Liu, Yujie; Li, Mingyue; Xiao, Qian; Chai, Shijie; Chen, Renjie; Ritschel, Tobias; Whiting, EmilySketch, as a representation of human thought, is abstract but also structured because it is presented as a two-dimensional image. Therefore, modeling it from semantic and structural perspectives is reasonable and effective. In this paper, for the semantic capturing, we compare the performance of two mainstream pre-trained models on the Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) task and propose a new model, Semantic Net (SNET), based on Contrastive Language-Image Pre-training (CLIP) with a more effective fine-tuning strategy and a Semantic Preservation Module. Furthermore, we propose three lightweight modules, Channels Fusion (CF), Layers Fusion (LF), and Semantic Structure Fusion (SSF) to endow SNET with the ability of stronger structure capture. Finally, we supervise the entire training process by a classification loss based on contrastive learning and bidirectional triplet loss based on cosine distance metric. We call the final version model Semantic Structure Net (SSNET). The quantitative experimental results show that both our proposed SNET and the enhanced version SSNET achieve the new SOTA (16% retrieval boost on the most difficult QuickDraw Ext dataset). The visualization experiments also prove our thinking on sketch modeling from the side.Item DViTGAN: Training ViTGANs with Diffusion(The Eurographics Association, 2024) Tong, Mengjun; Rao, Hong; Yang, Wenji; Chen, Shengbo; Zuo, Fang; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyRecent research findings indicate that injecting noise using diffusion can effectively improve the stability of GAN for image generation tasks. Although ViTGAN based on Vision Transformer has certain performance advantages compared to traditional GAN, there are still issues such as unstable training and generated image details are not rich enough. Therefore, in this paper, we propose a novel model, DViTGAN, which leverages the diffusion model to generate instance noise facilitating ViTGAN training. Specifically, we employ forward diffusion to progressively generate noise that follows a Gaussian mixture distribution, and then introduce the generated noise into the input image of the discriminator. The generator incorporates the discriminator's feedback by backpropagating through the forward diffusion process to improve its performance. In addition, we observe that the ViTGAN generator lacks positional information, leading to a decreased context modeling ability and slower convergence. To this end, we introduce Fourier embedding and relative positional encoding to enhance the model's expressive ability. Experiments on multiple popular benchmarks have demonstrated the effectiveness of our proposed model.Item DreamMapping: High-Fidelity Text-to-3D Generation via Variational Distribution Mapping(The Eurographics Association, 2024) Cai, Zeyu; Wang, Duotun; Liang, Yixun; Shao, Zhijing; Chen, Ying-Cong; Zhan, Xiaohang; Wang, Zeyu; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyScore Distillation Sampling (SDS) has emerged as a prevalent technique for text-to-3D generation, enabling 3D content creation by distilling view-dependent information from text-to-2D guidance. However, they frequently exhibit shortcomings such as over-saturated color and excess smoothness. In this paper, we conduct a thorough analysis of SDS and refine its formulation, finding that the core design is to model the distribution of rendered images. Following this insight, we introduce a novel strategy called Variational Distribution Mapping (VDM), which expedites the distribution modeling process by regarding the rendered images as instances of degradation from diffusion-based generation. This special design enables the efficient training of variational distribution by skipping the calculations of the Jacobians in the diffusion U-Net. We also introduce timestep-dependent Distribution Coefficient Annealing (DCA) to further improve distilling precision. Leveraging VDM and DCA, we use Gaussian Splatting as the 3D representation and build a text-to-3D generation framework. Extensive experiments and evaluations demonstrate the capability of VDM and DCA to generate high-fidelity and realistic assets with optimization efficiency.Item Biophysically-based Simulation of Sun-induced Skin Appearance Changes(The Eurographics Association, 2024) He, Xueyan; Huang, Minghao; Fu, Ruoyu; Guo, Jie; Yuan, Junping; Wang, Yanghai; Guo, Yanwen; Chen, Renjie; Ritschel, Tobias; Whiting, EmilySkin appearance modeling plays a crucial role in various fields such as healthcare, cosmetics and entertainment. However, the structure of the skin and its interaction with environmental factors like ultraviolet radiation are very complex and require further detailed modeling. In this paper, we propose a biophysically-based model to illustrate the changes in skin appearance under ultraviolet radiation exposure. It takes ultraviolet doses and specific biophysical parameters as inputs, leading to variations in melanin and blood concentrations, as well as the growth rate of skin cells. These changes bring alteration of light scattering, which is simulated by random walk method, and result in observable erythema and tanning. We showcase effects of various skin tones, comparisons across different body parts, and images illustrating the impact of occlusion. It demonstrates superior quality to the the commonly used method with more convincing skin details and bridges biological insights with visual simulations.Item High-Quality Cage Generation Based on SDF(The Eurographics Association, 2024) Qiu, Hao; Liao, Wentao; Chen, Renjie; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyCages are widely used in various applications of computer graphics, including physically-based rendering, shape deformation, physical simulation, etc. Given an input shape, we present an efficient and robust method for the automatic construction of high quality cage. Our method follows the envelope-and-simplify paradigm. In the enveloping stage, an isosurface enclosing the model is extracted from the signed distance field (SDF) of the shape. By leveraging the versatility of SDF, we propose a straightforward modification to SDF that enables the resulting isosurface to have better topological structure and capture the details of the shape well. In the simplification stage, we use the quadric error metric to simplify the isosurface and construct a cage, while rigorously ensuring the cage remains enclosing and does not self-intersect. We propose to further optimize various qualities of the cage for different applications, including distance to the original mesh and meshing quality. The cage generated by our method is guaranteed to be strictly enclosing the input shape, free of self-intersection, has the user-specified complexity and provides a good approximation to the input, as required by various applications. Through extensive experiments, we demonstrate that our method is robust and efficient for a wide variety of shapes with complex geometry and topology.Item PointJEM: Self-supervised Point Cloud Understanding for Reducing Feature Redundancy via Joint Entropy Maximization(The Eurographics Association, 2024) Cao, Xin; Xia, Huan; Wang, Haoyu; Su, Linzhi; Zhou, Ping; Li, Kang; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyMost deep learning methods for point cloud processing are supervised and require extensive labeled data. However, labeling point cloud data is a tedious and time-consuming task. Self-supervised representation learning can solve this problem by extracting robust and generalized features from unlabeled data. Yet, the features from representation learning are often redundant. Current methods typically reduce redundancy by imposing linear correlation constraints. In this paper, we introduce PointJEM, a self-supervised representation learning method for point clouds. It includes an embedding scheme that divides the vector into parts, each learning a unique feature. To minimize redundancy, PointJEM maximizes joint entropy between parts, making the features pairwise independent. We tested PointJEM on various datasets and found it significantly reduces redundancy beyond linear correlation. Additionally, PointJEM performs well in downstream tasks like classification and segmentation.Item 3DStyleGLIP: Part-Tailored Text-Guided 3D Neural Stylization(The Eurographics Association, 2024) Chung, SeungJeh; Park, JooHyun; Kang, HyeongYeop; Chen, Renjie; Ritschel, Tobias; Whiting, Emily3D stylization, the application of specific styles to three-dimensional objects, offers substantial commercial potential by enabling the creation of uniquely styled 3D objects tailored to diverse scenes. Recent advancements in artificial intelligence and textdriven manipulation methods have made the stylization process increasingly intuitive and automated. While these methods reduce human costs by minimizing reliance on manual labor and expertise, they predominantly focus on holistic stylization, neglecting the application of desired styles to individual components of a 3D object. This limitation restricts the fine-grained controllability. To address this gap, we introduce 3DStyleGLIP, a novel framework specifically designed for text-driven, parttailored 3D stylization. Given a 3D mesh and a text prompt, 3DStyleGLIP utilizes the vision-language embedding space of the Grounded Language-Image Pre-training (GLIP) model to localize individual parts of the 3D mesh and modify their appearance to match the styles specified in the text prompt. 3DStyleGLIP effectively integrates part localization and stylization guidance within GLIP's shared embedding space through an end-to-end process, enabled by part-level style loss and two complementary learning techniques. This neural methodology meets the user's need for fine-grained style editing and delivers high-quality part-specific stylization results, opening new possibilities for customization and flexibility in 3D content creation. Our code and results are available at https://github.com/sj978/3DStyleGLIP.Item Pacific Graphics 2024 - Conference Papers and Posters: Frontmatter(The Eurographics Association, 2024) Chen, Renjie; Ritschel, Tobias; Whiting, Emily; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyItem Single Image 3D Reconstruction of Creased Documents Using Shape-from-Shading with Template-Based Error Correction(The Eurographics Association, 2024) Wang, Linqin; Bo, Pengbo; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyWe present a method for reconstructing 3D models from single images of creased documents by enhancing the linear shapefrom- shading (SFS) technique with a template-based error correction mechanism. This mechanism is based on a mapping function established using precise data from a spherical surface modeled with linearized Lambertian shading. The error correction mapping is integrated into an algorithm that refines reconstructed depth values during the image scanning process. To resolve the inherent concave/convex ambiguities in SFS, we identify specific conditions based on assumed lighting and the geometric characteristics of creased documents, effectively improving reconstruction even in less controlled lighting environments. Our approach captures intricate geometric details on non-smooth surfaces. Comparative results demonstrate that our method provides superior accuracy and efficiency in reconstructing complex features such as creases and wrinkles.Item Fast Wavelet-domain Smoke Guiding(The Eurographics Association, 2024) Lyu, Luan; Ren, Xiaohua; Wu, Enhua; Yang, Zhi-Xin; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyWe propose a simple and efficient wavelet-based method to guide smoke simulation with specific velocity fields. This method primarily uses wavelets to combine low-resolution velocities with high-resolution details for smoke guiding. Due to the natural ability of wavelets to divide data into different frequency bands, we can merge low and high-resolution velocities by replacing wavelet coefficients. Compared to Fourier methods, the wavelet transform can use wavelets with shorter, compact supports, making the transformation faster and more adaptable to various boundary conditions. The method has a time complexity of O(n) and a memory complexity of n. Additionally, wavelets are compactly supported, which allows us to locally filter out or retain details by editing the wavelet coefficients. This enables us to locally edit smoke. Moreover, to accelerate the performance of wavelet transforms on GPUs, we propose a technique implemented in CUDA called in-kernel warp-level wavelet transform computation. This technique utilizes warp-level CUDA intrinsic functions to reduce data read times during computations, thus enhancing the efficiency of the wavelet transform. The experiments demonstrate that our proposed wavelet-based method achieves an approximate 5x speedup in 3D on GPUs compared to the Fourier methods, resulting in an overall improvement of around 40% in the smoke-guided simulation.Item Enhancing Human Optical Flow via 3D Spectral Prior(The Eurographics Association, 2024) Mao, Shiwei; Sun, Mingze; Huang, Ruqi; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyIn this paper, we consider the problem of human optical flow estimation, which is critical in a series of human-centric computer vision tasks. Recent deep learning-based optical flow models have achieved considerable accuracy and generalization by incorporating various kinds of priors. However, the majority either rely on large-scale 2D annotations or rigid priors, overlooking the 3D non-rigid nature of human articulations. To this end, we advocate enhancing human optical flow estimation via 3D spectral prior-aware pretraining, which is based on the well-known functional maps formulation in 3D shape matching. Our pretraining can be performed with synthetic human shapes. More specifically, we first render shapes to images and then leverage the natural inclusion maps from images to shapes to lift 2D optical flow into 3D correspondences, which are further encoded as functional maps. Such lifting operation allows to inject the intrinsic geometric features encoded in the spectral representations into optical flow learning, leading to improvement of the latter, especially in the presence of non-rigid deformations. In practice, we establish a pretraining pipeline tailored for triangular meshes, which is general regarding target optical flow network. It is worth noting that it does not introduce any additional learning parameters but only require some pre-computed eigen decomposition on the meshes. For RAFT and GMA, our pretraining task achieves improvements of 12.8% and 4.9% in AEPE on the SHOF benchmark, respectively.Item TPAM: Transferable Perceptual-constrained Adversarial Meshes(The Eurographics Association, 2024) Kang, Tengjia; Li, Yuezun; Zhou, Jiaran; Xin, Shiqing; Dong, Junyu; Tu, Changhe; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyTriangle meshes are widely used in 3D data representation due to their efficacy in capturing complex surfaces. Mesh classification, crucial in various applications, has typically been tackled by Deep Neural Networks (DNNs) with advancements in deep learning. However, these mesh networks have been proven vulnerable to adversarial attacks, where slight distortions to meshes can cause large prediction errors, posing significant security risks. Although several mesh attack methods have been proposed recently, two key aspects of Stealthiness and Transferability remain underexplored. This paper introduces a new method called Transferable Perceptual-constrained Adversarial Meshes (TPAM) to investigate these aspects in adversarial attacks further. Specifically, we present a Perceptual-constrained objective term to restrict the distortions and introduce an Adaptive Geometry-aware Attack Optimization strategy to adjust attacking strength iteratively based on local geometric frequencies, striking a good balance between stealthiness and attacking accuracy. Moreover, we propose a Bayesian Surrogate Network to enhance transferability and introduce a new metric, the Area Under Accuracy (AUACC), for comprehensive performance evaluation. Experiments on various mesh classifiers demonstrate the effectiveness of our method in both white-box and black-box settings, enhancing the attack stealthiness and transferability across multiple networks. Our research can enhance the understanding of DNNs, thus improving the robustness of mesh classifiers. The code is available at https://github.com/Tengjia-Kang/TPAM.Item GazeMoDiff: Gaze-guided Diffusion Model for Stochastic Human Motion Prediction(The Eurographics Association, 2024) Yan, Haodong; Hu, Zhiming; Schmitt, Syn; Bulling, Andreas; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyHuman motion prediction is important for many virtual and augmented reality (VR/AR) applications such as collision avoidance and realistic avatar generation. Existing methods have synthesised body motion only from observed past motion, despite the fact that human eye gaze is known to correlate strongly with body movements and is readily available in recent VR/AR headsets. We present GazeMoDiff - a novel gaze-guided denoising diffusion model to generate stochastic human motions. Our method first uses a gaze encoder and a motion encoder to extract the gaze and motion features respectively, then employs a graph attention network to fuse these features, and finally injects the gaze-motion features into a noise prediction network via a cross-attention mechanism to progressively generate multiple reasonable human motions in the future. Extensive experiments on the MoGaze and GIMO datasets demonstrate that our method outperforms the state-of-the-art methods by a large margin in terms of multi-modal final displacement error (17.3% on MoGaze and 13.3% on GIMO). We further conducted a human study (N=21) and validated that the motions generated by our method were perceived as both more precise and more realistic than those of prior methods. Taken together, these results reveal the significant information content available in eye gaze for stochastic human motion prediction as well as the effectiveness of our method in exploiting this information.Item Convex Hull Computation in a Grid Space: A GPU Accelerated Parallel Filtering Approach(The Eurographics Association, 2024) Antony, Joms; Mukundan, Manoj Kumar; Thomas, Mathew; Muthuganapathy, Ramanathan; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyMany real-world applications demand the computation of a convex hull (CH) when the input points originate from structured configurations such as two-dimensional (2D) or three-dimensional (3D) grids. Convex hull in grid space has found applications in geographic information systems, medical data analysis, path planning for robots/autonomous vehicles etc. Conventional as well as existing GPU-accelerated algorithms available for CH computation cannot operate directly on 2D or 3D grids represented in matrix format and do not exploit the inherent sequential ordering in such rasterized representations. This work introduces novel filtering algorithms, initially developed for a 2D grid space and subsequently extended to 3D to speed up the hull computation. They are further extended as GPU-CPU hybrid algorithms and are implemented and evaluated on a commercial NVIDIA GPU. For a 2D grid, the number of contributing pixels is always restricted to ≤ 2n for an (n×n) grid. Moreover, they are extracted in lexicographic order, ensuring an efficient O(n) computation of CH. Similarly, in 3D, the number of contributing voxels is always limited to ≤ 2n2 for an (n×n×n) voxel matrix. Additionally, 2D CH filtering is enabled across all slices of the 3D grid in parallel, leading to a further reduction in the number of contributing voxels to be fed to the 3D CH computation procedure. Comparison with the state of the art indicated that our method is superior, especially for large and sparse point clouds.Item Label Name is Mantra: Unifying Point Cloud Segmentation across Heterogeneous Datasets(The Eurographics Association, 2024) Liang, Yixun; He, Hao; Xiao, Shishi; Lu, Hao; Chen, Yingcong; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyPoint cloud segmentation is a fundamental task in 3D vision that serves a wide range of applications. Despite recent advancements, its practical usability is still limited by the availability of training data. The prevalent methodologies cannot optimally exploit multiple datasets due to the inconsistency of labels across datasets. In this work, we introduce a robust method that accommodates learning from diverse datasets with variant label sets. We leverage a pre-trained language model to map discrete labels into a continuous latent space using their semantic names. This harmonizes labels across datasets, facilitating concurrent training. Contrarily, when classifying points within the continuous 3D space via their linguistic tokens, our model exhibits superior generalizability compared to extant methods with fixed decoder structures. Further, our approach assimilates prompt learning to alleviate data shifts across sources. Comprehensive evaluations attest that our model markedly surpasses current benchmarks.Item Learning-based Self-Collision Avoidance in Retargeting using Body Part-specific Signed Distance Fields(The Eurographics Association, 2024) Lee, Junwoo; Kim, Hoimin; Kwon, Taesoo; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyMotion retargeting is a technique for applying the motion of one character to a new character. Differences in shapes and proportions between characters can cause self-collisions during the retargeting process. To address this issue, we propose a new collision resolution strategy comprising three key components: a collision detection module, a self-collision resolution model, and a training strategy for the collision resolution model. The collision detection module generates collision information based on changes in posture. The self-collision resolution model, which is based on a neural network, uses this collision information to resolve self-collisions. The proposed training strategy enhances the performance of the self-collision resolution model. Compared to previous studies, our self-collision resolution process demonstrates superior performance in terms of accuracy and generalization. Our model reduces the average penetration depth across the entire body by 56%, which is 28% better than the previous studies. Additionally, the minimum distance from the end-effectors to the skin averaged 2.65cm, which is more than 0.8cm smaller than in the previous studies. Furthermore, it takes an average of 7.9ms to solve one frame, enabling online real-time self-collision resolution.Item Colorectal Protrusions Detection based on Conformal Colon Flattening(The Eurographics Association, 2024) Ren, Yuxue; Hu, Wei; Li, Zhengbin; Chen, Wei; Lei, Na; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyWe propose an innovative approach to automatically detect colorectal protrusions on the colon surface. In the colon, these protrusions include polyps. This approach comprises two successive stages. In the first stage, we identify single protrusions and extract folds containing suspected protrusions in the flattened colon image by integrating shape analysis with curvature rendering and conformal colon flattening. This stage enables accurate and rapid detection of single protrusions, especially flat protrusions, since the 3D protrusion detection problem is converted into a 2D pattern recognition problem. To detect protrusions on folds, the folds containing suspected protrusions is inversely mapped back to 3D colon surface in the second stage. We detect protrusions in the 3D surface area by curvature-based analysis and reduce the false positives by quadratic surface fitting. We evaluated our method via real colon data from the National CT Colonography Trial of the American College of Radiology Imaging Network (ACRIN, 6664). Experimental results show that our method can efficiently and accurately identify protrusion lesions, is robust to noise, and is suitable for implementation within CTC-CAD systems.Item Physics-Informed Neural Fields with Neural Implicit Surface for Fluid Reconstruction(The Eurographics Association, 2024) Duan, Zheng; Ren, Zhong; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyRecovering fluid density and velocity from multi-view RGB videos poses a formidable challenge. Existing solutions typically assume knowledge of obstacles and lighting, or are designed for simple fluid scenes without obstacles or complex lighting. Addressing these challenges, our study presents a novel hybrid model named PINFS, which ingeniously fuses the capabilities of Physics-Informed Neural Fields (PINF) and Neural Implicit Surfaces (NeuS) to accurately reconstruct scenes containing smoke. By combining the capabilities of SIREN-NeRFt in PINF for creating realistic smoke representations and the accuracy of NeuS in depicting solid obstacles, PINFS excels in providing detailed reconstructions of smoke scenes with improved visual authenticity and physical precision. PINFS distinguishes itself by incorporating solid's view-independent opaque density and addressing Neumann boundary conditions through signed distances from NeuS. This results in a more realistic and physically plausible depiction of smoke behavior in dynamic scenarios. Comprehensive evaluations of synthetic and real-world datasets confirm the model's superior performance in complex scenes with obstacles. PINFS introduces a novel framework for realistically and physically consistent rendering of complex fluid dynamics scenarios, pushing the boundaries in the utilization of mixed physical and neural-based approaches. The code is available at https://github.com/zduan3/pinfs_code.Item Editing Compact Voxel Representations on the GPU(The Eurographics Association, 2024) Molenaar, Mathijs; Eisemann, Elmar; Chen, Renjie; Ritschel, Tobias; Whiting, EmilyA Sparse Voxel Directed Acyclic Graph (SVDAG) is an efficient representation to display and store a highly-detailed voxel representation in a very compact data structure. Yet, editing such a high-resolution scene in real-time is challenging. Existing solutions are hybrid, involving the CPU, and are restricted to small local modifications. In this work, we address this bottleneck and propose a solution to perform edits fully on the graphics card, enabled by dynamic GPU hash tables. Our framework makes large editing operations possible, such as 3D painting, at real-time frame rates.
- «
- 1 (current)
- 2
- 3
- »