PG2023 Short Papers and Posters

Permanent URI for this collection

https://diglib.eg.org/handle/10.2312/3543885

Browse

Now showing 1 - 20 of 26

Multi-Stage Degradation and Content Embedding Fusion for Blind Super-Resolution
(The Eurographics Association, 2023) Zhang, Haiyang; Jiang, Mengyu; Liu, Liang; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
To achieve promising results on blind image super-resolution (SR), some Unsupervised Degradation Prediction (UDP) methods narrow the domain gap between the degradation embedding space and the SR feature space by fusing the degradation embedding with the additional content embedding before multi-stage SR. However, fusing these two embeddings before multi-stage SR is inflexible, due to the variation of the domain gap at each SR stage. To address this issue, we propose the Multi-Stage Degradation and Content Embedding Fusion (MDCF), which adaptively fuses the degradation embedding with the content embedding at each SR stage rather than before multi-stage SR. Based on the MDCF, we introduce a novel UDP method, called MDCFnet, which contains an additional Dual-Path Local and Global encoder (DPLG) to extract the degradation embedding and the content embedding separately. Specially, DPLG diversifies receptive fields to enrich the degradation embedding and combines local and global features to optimize the content embedding. Extensive experiments on real images and several benchmarks demonstrate that the proposed MDCFnet can outperform the existing UDP methods and achieve competitive performance on PSNR and SSIM even compared with the state-of-the-art SKP methods.
Multi-scale Monocular Panorama Depth Estimation
(The Eurographics Association, 2023) Mohadikar, Payal; Fan, Chuanmao; Zhao, Chenxi; Duan, Ye; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Panorama images are widely used for scene depth estimation as they provide comprehensive scene representation. The existing deep-learning monocular panorama depth estimation networks produce inconsistent, discontinuous, and poor-quality depth maps. To overcome this, we propose a novel multi-scale monocular panorama depth estimation framework. We use a coarseto- fine depth estimation approach, where multi-scale tangent perspective images, projected from 360 images, are given to coarse and fine encoder-decoder networks to produce multi-scale perspective depth maps, that are merged to get low and high-resolution 360 depth maps. The coarse branch extracts holistic features that guide fine branch extracted features using a Multi-Scale Feature Fusion (MSFF) module at the network bottleneck. The performed experiments on the Stanford2D3D benchmark dataset show that our model outperforms the existing methods, producing consistent, smooth, structure-detailed, and accurate depth maps.
SketchBodyNet: A Sketch-Driven Multi-faceted Decoder Network for 3D Human Reconstruction
(The Eurographics Association, 2023) Wang, Fei; Tang, Kongzhang; Wu, Hefeng; Zhao, Baoquan; Cai, Hao; Zhou, Teng; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Reconstructing 3D human shapes from 2D images has received increasing attention recently due to its fundamental support for many high-level 3D applications. Compared with natural images, freehand sketches are much more flexible to depict various shapes, providing a high potential and valuable way for 3D human reconstruction. However, such a task is highly challenging. The sparse abstract characteristics of sketches add severe difficulties, such as arbitrariness, inaccuracy, and lacking image details, to the already badly ill-posed problem of 2D-to-3D reconstruction. Although current methods have achieved great success in reconstructing 3D human bodies from a single-view image, they do not work well on freehand sketches. In this paper, we propose a novel sketch-driven multi-faceted decoder network termed SketchBodyNet to address this task. Specifically, the network consists of a backbone and three separate attention decoder branches, where a multi-head self-attention module is exploited in each decoder to obtain enhanced features, followed by a multi-layer perceptron. The multi-faceted decoders aim to predict the camera, shape, and pose parameters, respectively, which are then associated with the SMPL model to reconstruct the corresponding 3D human mesh. In learning, existing 3D meshes are projected via the camera parameters into 2D synthetic sketches with joints, which are combined with the freehand sketches to optimize the model. To verify our method, we collect a large-scale dataset of about 26k freehand sketches and their corresponding 3D meshes containing various poses of human bodies from 14 different angles. Extensive experimental results demonstrate our SketchBodyNet achieves superior performance in reconstructing 3D human meshes from freehand sketches.
Local Positional Encoding for Multi-Layer Perceptrons
(The Eurographics Association, 2023) Fujieda, Shin; Yoshimura, Atsushi; Harada, Takahiro; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
A multi-layer perceptron (MLP) is a type of neural networks which has a long history of research and has been studied actively recently in computer vision and graphics fields. One of the well-known problems of an MLP is the capability of expressing highfrequency signals from low-dimensional inputs. There are several studies for input encodings to improve the reconstruction quality of an MLP by applying pre-processing against the input data. This paper proposes a novel input encoding method, local positional encoding, which is an extension of positional and grid encodings. Our proposed method combines these two encoding techniques so that a small MLP learns high-frequency signals by using positional encoding with fewer frequencies under the lower resolution of the grid to consider the local position and scale in each grid cell. We demonstrate the effectiveness of our proposed method by applying it to common 2D and 3D regression tasks where it shows higher-quality results compared to positional and grid encodings, and comparable results to hierarchical variants of grid encoding such as multi-resolution grid encoding with equivalent memory footprint.
TreeGCN-ED: A Tree-Structured Graph-Based Autoencoder Framework For Point Cloud Processing
(The Eurographics Association, 2023) Singh, Prajwal; Tiwari, Ashish; Sadekar, Kaustubh; Raman, Shanmuganathan; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Point cloud is a widely used technique for representing and storing 3D geometric data. Several methods have been proposed for processing point clouds for tasks such as 3D shape classification and clustering. This work presents a tree-structured autoencoder framework to generate robust embeddings of point clouds through hierarchical information aggregation using graph convolution. We visualize the t-SNE map to highlight the ability of learned embeddings to distinguish between different object classes. We further demonstrate the robustness of these embeddings in applications such as point cloud interpolation, completion, and single image-based point cloud reconstruction. The anonymized code is available here for research purposes.
Visualization System for Analyzing Congestion Pricing Policies
(The Eurographics Association, 2023) Choi, SeokHwan; Seo, Seongbum; Yoo, Sangbong; Jang, Yun; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Traffic congestion, which increases every year, has a negative impact on environmental pollution and productivity. Congestion pricing policy has been shown to be effective in Singapore, London, and Stockholm as one of the ways to solve traffic congestion. Pricing policy has different effects depending on a target area, pricing scheme, and toll. In general, congestion pricing policy researchers conduct statistical analysis of simulation model predictions within a fixed region and time range. However, existing research techniques make analyzing all traffic data characteristics with spatiotemporal dependency difficult. In this paper, we propose a visualization system for analyzing the influence of congestion pricing policy using SUMO and TCI. Our system provides a district-level analysis process to explore the influence of pricing policy over time and area.
Automatic Vector Caricature via Face Parametrization
(The Eurographics Association, 2023) Madono, Koki; Hold-Geoffroy, Yannick; Li, Yijun; Ito, Daichi; Echevarria, Jose; Smith, Cameron; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Automatic caricature generation is a challenging task that aims to emphasize the subject's facial characteristics while preserving its identity. Due to the complexity of the task, caricatures could exclusively be performed by a trained artist. Recent developments in deep learning have achieved promising results in capturing artistic styles. Despite the success, current methods still struggle to accurately capture the whimsical aspect of caricatures while preserving identity. In this work, we propose Parametric Caricature, the first parametric-based caricature generation that yields vectorized and animatable caricatures. We devise several hundred parameters to encode facial traits, which our method directly predicts instead of estimating the raster caricature like previous methods. To guide the attention of the method, we segment the different parts of the face and retrieve the most similar parts from an artist-made database of caricatures. Our method proposes visually appealing caricatures more adapted to use as avatars than existing methods, as demonstrated by our user study.
Emotion-based Interaction Technique Using User's Voice and Facial Expressions in Virtual and Augmented Reality
(The Eurographics Association, 2023) Ko, Beom-Seok; Kang, Ho-San; Lee, Kyuhong; Braunschweiler, Manuel; Zünd, Fabio; Sumner, Robert W.; Choi, Soo-Mi; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
This paper presents a novel interaction approach based on a user's emotions within augmented reality (AR) and virtual reality (VR) environments to achieve immersive interaction with virtual intelligent characters. To identify the user's emotions through voice, the Google Speech-to-Text API is used to transcribe speech and then the RoBERTa language processing model is utilized to classify emotions. In AR environment, the intelligent character can change the styles and properties of objects based on the recognized user's emotions during a dialog. On the other side, in VR environment, the movement of the user's eyes and lower face is tracked by VIVE Pro Eye and Facial Tracker, and EmotionNet is used for emotion recognition. Then, the virtual environment can be changed based on the recognized user's emotions. Our findings present an interesting idea for integrating emotionally intelligent characters in AR/VR using generative AI and facial expression recognition.
A Simple Stochastic Regularization Technique for Avoiding Overfitting in Low Resource Image Classification
(The Eurographics Association, 2023) Ji, Ya Tu; Wang, Bai Lun; Ren, Qing Dao Er Ji; Shi, Bao; Wu, Nier E.; Lu, Min; Liu, Na; Zhuang, Xu Fei; Xu, Xuan Xuan; Wang, Li; Dai, Ling Jie; Yao, Miao Miao; Li, Xiao Mei; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Drop type technique, as a method that can effectively regulate the co-adaptations and prediction ability of neural network units, is widely used in model parameter optimization to reduce overfitting problems. However, low resource image classification faces serious overfitting problems, and the data sparsity problem weakens or even disappears the effectiveness of most regularization methods. This paper is inspired by the value iteration strategy and attempts a Drop type method based on Metcalfe's law, named Metcalfe-Drop. The experimental results indicate that using Metcalfe-Drop technique as a basis to determine parameter sharing is more effective than randomly controlling neurons according to a certain probability. Our code is available at https://gitee.com/giteetu/metcalfe-drop.git.
Hand Shadow Art: A Differentiable Rendering Perspective
(The Eurographics Association, 2023) Gangopadhyay, Aalok; Singh, Prajwal; Tiwari, Ashish; Raman, Shanmuganathan; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Shadow art is an exciting form of sculptural art that produces captivating artistic effects through the 2D shadows cast by 3D shapes. Hand shadows, also known as shadow puppetry or shadowgraphy, involve creating various shapes and figures using your hands and fingers to cast meaningful shadows on a wall. In this work, we propose a differentiable rendering-based approach to deform hand models such that they cast a shadow consistent with a desired target image and the associated lighting configuration. We showcase the results of shadows cast by a pair of two hands and the interpolation of hand poses between two desired shadow images. We believe that this work will be a useful tool for the graphics community.
Detection of Impurities in Wool Based on Improved YOlOV8
(The Eurographics Association, 2023) Liu, Yang; Ji, Yatu; Ren, Qing Dao Er Ji; Shi, Bao; Zhuang, Xufei; Yao, Miaomiao; Li, Xiaomei; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
In the current production process of wool products, the cleaning of wool raw materials has been realized in an automated way. However, detecting whether the washed and dried wool still contains excessive impurities still requires manual testing. This method greatly reduces production efficiency. To solve the problem of detecting wool impurities, we propose an improved model based on YOLOv8. Our work applied some techniques to solve the low resource model training problem, and incorporated a block for small object detection into the new neural network structure. The newly proposed model achieved an accuracy of 84.3% on the self built dataset and also achieved good results on the VisDrone2019 dataset.
SS-SfP: Neural Inverse Rendering for Self Supervised Shape from (Mixed) Polarization
(The Eurographics Association, 2023) Tiwari, Ashish; Raman, Shanmuganathan; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
We present a novel inverse rendering-based framework to estimate the 3D shape (per-pixel surface normals and depth) of objects and scenes from single-view polarization images, the problem popularly known as Shape from Polarization (SfP). The existing physics-based and learning-based methods for SfP perform under certain restrictions, i.e., (a) purely diffuse or purely specular reflections, which are seldom in the real surfaces, (b) availability of the ground truth surface normals for direct supervision that are hard to acquire and are limited by the scanner's resolution, and (c) known refractive index. To overcome these restrictions, we start by learning to separate the partially-polarized diffuse and specular reflection components, which we call reflectance cues, based on a modified polarization reflection model and then estimate shape under mixed polarization through an inverse-rendering based self-supervised deep learning framework called SS-SfP, guided by the polarization data and estimated reflectance cues. Furthermore, we also obtain the refractive index as a non-linear least squares solution. Through extensive quantitative and qualitative evaluation, we establish the efficacy of the proposed framework over simple single-object scenes from DeepSfP dataset and complex in-the-wild scenes from SPW dataset in an entirely self-supervised setting. To the best of our knowledge, this is the first learning-based approach to address SfP under mixed polarization in a completely selfsupervised framework. Code will be made publicly available.
Color3d: Photorealistic Texture Mapping for 3D Mesh
(The Eurographics Association, 2023) Zhao, Chenxi; Fan, Chuanmao; Mohadikar, Payal; Duan, Ye; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
3D reconstruction plays a significant role in various fields, including medical imaging, architecture, and forensic science, in both research and industry. The quality of color is one of the criteria that determine reconstruction performance. However, the predicted color from deep learning often suffers from low quality and a lack of details. While traditional texture mapping methods can provide superior color, they are restricted by mesh quality. In this study, we propose Color3D, a comprehensive procedure that applies photorealistic colors to the reconstructed mesh, accommodating both static objects and animations. The necessary inputs include multiview RGB images, depth images, camera poses, and camera intrinsic. Compared to traditional methods, our approach replaces face colors directly from the texture map with vertex colors from multiview images. The colors of the faces are obtained by interpolating the vertex colors of each triangle. Our method can generate high-quality color for different objects, and the performance remains strong even when the input mesh is not perfect.
Feature-Sized Sampling for Vector Line Art
(The Eurographics Association, 2023) Ohrhallinger, Stefan; Parakkat, Amal Dev; Memari, Pooran; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
By introducing a first-of-its-kind quantifiable sampling algorithm based on feature size, we present a fresh perspective on the practical aspects of planar curve sampling. Following the footsteps of e-sampling, which was originally proposed in the context of curve reconstruction to offer provable topological guarantees [ABE98] under quantifiable bounds, we propose an arbitrarily precise e-sampling algorithm for sampling smooth planar curves (with a prior bound on the minimum feature size of the curve). This paper not only introduces the first such algorithm which provides user-control and quantifiable precision but also highlights the importance of such a sampling process under two key contexts: 1) To conduct a first study comparing theoretical sampling conditions with practical sampling requirements for reconstruction guarantees that can further be used for analysing the upper bounds of e for various reconstruction algorithms with or without proofs, 2) As a feature-aware sampling of vector line art that can be used for applications such as coloring and meshing.
Text2Mat: Generating Materials from Text
(The Eurographics Association, 2023) He, Zhen; Guo, Jie; Zhang, Yan; Tu, Qinghao; Chen, Mufan; Guo, Yanwen; Wang, Pengyu; Dai, Wei; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Specific materials are often associated with a certain type of objects in the real world. They simulate the way the surface of the object interacting with light and are named after that type of object. We observe that the text labels of materials contain advanced semantic information, which can be used as a guidance to assist the generation of specific materials. Based on that, we propose Text2Mat, a text-guided material generation framework. To meet the demand of material generation based on text descriptions, we construct a large set of PBR materials with specific text labels. Each material contains detailed text descriptions that match the visual appearance of the material. Furthermore, for the sake of controlling the texture and spatial layout of generated materials through text, we introduce texture attribute labels and extra attributes describing regular materials. Using this dataset, we train a specific neural network adapted from Stable Diffusion to achieve text-based material generation. Extensive experiments and rendering effects demonstrate that Text2Mat can generate materials with spatial layout and texture styles highly corresponding to text descriptions.
Sketch-to-Architecture: Generative AI-aided Architectural Design
(The Eurographics Association, 2023) Li, Pengzhi; Li, Baijuan; Li, Zhiheng; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Recently, the development of large-scale models has paved the way for various interdisciplinary research, including architecture. By using generative AI, we present a novel workflow that utilizes AI models to generate conceptual floorplans and 3D models from simple sketches, enabling rapid ideation and controlled generation of architectural renderings based on textual descriptions. Our work demonstrates the potential of generative AI in the architectural design process, pointing towards a new direction of computer-aided architectural design.
Combining Transformer and CNN for Super-Resolution of Animal Fiber Microscopy Images
(The Eurographics Association, 2023) Li, Jiagen; Ji, Yatu; Lu, Min; Wang, Li; Dai, Lingjie; Xu, Xuanxuan; Wu, Nier; Liu, Na; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
The images of cashmere and wool fibers used for scientific research in the textile field are mostly acquired manually under an optical microscope. However, due to the interference of microscope quality, shooting environment, focal length selection, acquisition techniques and other factors, the quality of the obtained photographs tends to have a low resolution, and it is difficult to display the fine fiber texture structure and scale details. To address the above problems, a lightweight super-resolution reconstruction algorithm with multi-scale hierarchical screening is proposed. Specifically, firstly, a hybrid module incorporating SwinTransformer and enhanced channel attention is proposed to extract the global features and obtain the important localization among them, in addition, a multi-scale hierarchical screening filtering module is proposed based on the residual model, which amplifies the feature information focusing on high-frequency regions by splitting the channels to allow the model to adaptively weight the features according to the feature weights and amplifies the feature information focusing on high-frequency regions. Finally, the global average pooling attention module integrates and weights the high-frequency features again to enhance details such as edges and textures. A large number of experiments show that compared with other state-of-the-art algorithms, the proposed method significantly improves the image quality on the fiber dataset, and at the same time proves the effectiveness of the proposed method at all scales in five public datasets, occupies less memory parameters than SwinIR, and not only improves the PSNR and SSIM, but also reduces the parameters compared with the light-weight ESRT.
Generalizable Dynamic Radiance Fields For Talking Head Synthesis With Few-shot
(The Eurographics Association, 2023) Dang, Rujing; Wang, Shaohui; Wang, Haoqian; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Audio-driven talking head generation has wide applications in virtual games, hosts, online meetings, etc. Recently, great achievements have been made in synthesizing talking heads based on neural radiance fields. However, the existing few-shot talking head synthesis methods still suffer from inaccurate deformation and lack of visual consistency. Therefore, we propose a Generalizable Dynamic Radiance Field (GDRF), which can rapidly generalize to unseen identities with few-shot. We introduce a warping module with 3D constraints to act in feature volume space, which is identity adaptive and exhibits excellent shape-shifting abilities. Our method can generate more accurately deformed and view consistent target images compared to previous methods. Furthermore, we map the audio signal to 3DMM parameters by applying an LSTM network, which helps get long-term context and generate more continuous and natural video. Extensive experiments demonstrate the superiority of our proposed method.
Revisiting Visualization Evaluation Using EEG and Visualization Literacy Assessment Test
(The Eurographics Association, 2023) Yim, Soobin; Jung, Chanyoung; Yoon, Chanyoung; Yoo, Sangbong; Choi, Seongwon; Jang, Yun; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
Using EEG signals, also known as Electroencephalogram, can provide a quantitative measure of human cognitive load, making it an effective tool for evaluating visualization. However, the suitability of EEG for visualization evaluation has not been verified in previous studies. This paper investigates the feasibility of utilizing EEG data in visualization evaluation by comparing previous experiments. We trained and estimated individual CNN models for each subject using the EEG data. Our study demonstrates that EEG-based visualization evaluation provides a more feasible estimate of the difficulties experienced by subjects during the visualization task compared to previous studies that used accuracy and response time.
WaveNet: Wave-Aware Image Enhancement
(The Eurographics Association, 2023) Dang, Jiachen; Li, Zehao; Zhong, Yong; Wang, Lishun; Chaine, Raphaëlle; Deng, Zhigang; Kim, Min H.
As a low-level vision task, image enhancement is widely used in various computer vision applications. Recently, multiple methods combined with CNNs, MLP, Transformer, and the Fourier transform have achieved promising results on image enhancement tasks. However, these methods cannot achieve a balance between accuracy and computational cost. In this paper, we formulate the enhancement into a signal modulation problem and propose the WaveNet architecture, which performs well in various parameters and improves the feature expression using wave-like feature representation. Specifically, to better capture wave-like feature representations, we propose to represent a pixel as a sampled value of a signal function with three wave functions (Cosine Wave (CW), Sine Wave (SW), and Gating Wave (GW)) inspired by the Fourier transform. The amplitude and phase are required to generate the wave-like features. The amplitude term includes the original contents of features, and the phase term modulates the relationship between various inputs and fixed weights. To dynamically obtain the phase and the amplitude, we build the Wave Transform Block (WTB) that adaptively generates the waves and modulates the wave superposition mode. Based on the WTB, we establish an effective architecture WaveNet for image enhancement. Extensive experiments on six real-world datasets show that our model achieves better quantitative and qualitative results than state-of-the-art methods. The source code and pretrained model are available at https://github.com/DeniJsonC/WaveNet.

Browse

Browsing PG2023 Short Papers and Posters by Issue Date

Results Per Page

Sort Options