41-Issue 7

Permanent URI for this collection

https://diglib.eg.org/handle/10.2312/2633207

Browse

Now showing 1 - 20 of 57

NSTO: Neural Synthesizing Topology Optimization for Modulated Structure Generation
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Zhong, Shengze; Punpongsanon, Parinya; Iwai, Daisuke; Sato, Kosuke; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Nature evolves structures like honeycombs at optimized performance with limited material. These efficient structures can be artificially created with the collaboration of structural topology optimization and additive manufacturing. However, the extensive computation cost of topology optimization causes low mesh resolution, long solving time, and rough boundaries that fail to match the requirements for meeting the growing personal fabrication demands and printing capability. Therefore, we propose the neural synthesizing topology optimization that leverages a self-supervised coordinate-based network to optimize structures with significantly shorter computation time, where the network encodes the structural material layout as an implicit function of coordinates. Continuous solution space is further generated from optimization tasks under varying boundary conditions or constraints for users' instant inference of novel solutions. We demonstrate the system's efficacy for a broad usage scenario through numerical experiments and 3D printing.
Pixel Art Adaptation for Handicraft Fabrication
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Igarashi, Yuki; Igarashi, Takeo; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Knitting and weaving patterns can be visually represented as pixel art. With hand knitting and weaving, human error (shifting, duplicating, or skipping pixels) can occur during manual fabrication. It is too costly to change already-fabricated pixels, so experts often adapt pixels that have not yet been fabricated to make the errors less visible. This paper proposes an automatic adaptation process to minimize visual artifacts. The system presents multiple adaptation possibilities to the user, who can choose the proposed adaptation or untie and re-fabricate their work. In typical handicraft fabrication, the design is complete before the start of fabrication and remains fixed during fabrication. Our system keeps updating the design during fabrication to tolerate human errors in the process. We implemented the proposed algorithm in a system that visualizes the knitting pattern, cross-stitching and bead weaving processes.
Color-mapped Noise Vector Fields for Generating Procedural Micro-patterns
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Grenier, Charline; Sauvage, Basile; Dischler, Jean-Michel; Thery, Sylvain; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Stochastic micro-patterns successfully enhance the realism of virtual scenes. Procedural models using noise combined with transfer functions are extremely efficient. However, most patterns produced today employ 1D transfer functions, which assign color, transparency, or other material attributes, based solely on the single scalar quantity of noise. Multi-dimensional transfer functions have received widespread attention in other fields, such as scientific volume rendering. But their potential has not yet been well explored for modeling micro-patterns in the field of procedural texturing. We propose a new procedural model for stochastic patterns, defined as the composition of a bi-dimensional transfer function (a.k.a. color-map) with a stochastic vector field. Our model is versatile, as it encompasses several existing procedural noises, including Gaussian noise and phasor noise. It also generates a much larger gamut of patterns, including locally structured patterns which are notoriously difficult to reproduce. We leverage the Gaussian assumption and a tiling and blending algorithm to provide real-time generation and filtering. A key contribution is a real-time approximation of the second order statistics over an arbitrary pixel footprint, which enables, in addition, the filtering of procedural normal maps. We exhibit a wide variety of results, including Gaussian patterns, profiled waves, concentric and non-concentric patterns.
Learning 3D Shape Aesthetics Globally and Locally
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Chen, Minchan; Lau, Manfred; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
There exist previous works in computing the visual aesthetics of 3D shapes ''globally'', where the term global means that shape aesthetics data are collected for whole 3D shapes and then used to compute the aesthetics of whole 3D shapes. In this paper, we introduce a novel method that takes such ''global'' shape aesthetics data, and learn both a ''global'' shape aesthetics measure that computes aesthetics scores for whole 3D shapes, and a ''local'' shape aesthetics measure that computes to what extent a local region on the 3D shape surface contributes to the whole shape's aesthetics. These aesthetics measures are learned, and hence do not consider existing handcrafted notions of what makes a 3D shape aesthetic. We take a dataset of global pairwise shape aesthetics, where humans compares between pairs of shapes and say which shape from each pair is more aesthetic. Our solution proposes a point-based neural network that takes a 3D shape represented by surface patches as input and jointly outputs its global aesthetics score and a local aesthetics map. To build connections between global and local aesthetics, we embed the global and local features into the same latent space and then output scores with the weights-shared aesthetics predictors. Furthermore, we designed three loss functions to supervise the training jointly. We demonstrate the shape aesthetics results globally and locally to show that our framework can make good global aesthetics predictions while the predicted aesthetics maps are consistent with human perception. In addition, we present several applications enabled by our local aesthetics metric.
WTFM Layer: An Effective Map Extractor for Unsupervised Shape Correspondence
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Liu, Shengjun; Xu, Haojun; Yan, Dong-Ming; Hu, Ling; Liu, Xinru; Li, Qinsong; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
We propose a novel unsupervised learning approach for computing correspondences between non-rigid 3D shapes. The core idea is that we integrate a novel structural constraint into the deep functional map pipeline, a recently dominant learning framework for shape correspondence, via a powerful spectral manifold wavelet transform (SMWT). As SMWT is isometrically invariant operator and can analyze features from multiple frequency bands, we use the multiscale SMWT results of the learned features as function preservation constraints to optimize the functional map by assuming each frequency-band information of the descriptors should be correspondingly preserved by the functional map. Such a strategy allows extracting significantly more deep feature information than existing approaches which only use the learned descriptors to estimate the functional map. And our formula strongly ensure the isometric properties of the underlying map. We also prove that our computation of the functional map amounts to filtering processes only referring to matrix multiplication. Then, we leverage the alignment errors of intrinsic embedding between shapes as a loss function and solve it in an unsupervised way using the Sinkhorn algorithm. Finally, we utilize DiffusionNet as a feature extractor to ensure that discretization-resistant and directional shape features are produced. Experiments on multiple challenging datasets prove that our method can achieve state-of-the-art correspondence quality. Furthermore, our method yields significant improvements in robustness to shape discretization and generalization across the different datasets. The source code and trained models will be available at https://github.com/HJ-Xu/ WTFM-Layer.
User-Controllable Latent Transformer for StyleGAN Image Layout Editing
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Endo, Yuki; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Latent space exploration is a technique that discovers interpretable latent directions and manipulates latent codes to edit various attributes in images generated by generative adversarial networks (GANs). However, in previous work, spatial control is limited to simple transformations (e.g., translation and rotation), and it is laborious to identify appropriate latent directions and adjust their parameters. In this paper, we tackle the problem of editing the StyleGAN image layout by annotating the image directly. To do so, we propose an interactive framework for manipulating latent codes in accordance with the user inputs. In our framework, the user annotates a StyleGAN image with locations they want to move or not and specifies a movement direction by mouse dragging. From these user inputs and initial latent codes, our latent transformer based on a transformer encoderdecoder architecture estimates the output latent codes, which are fed to the StyleGAN generator to obtain a result image. To train our latent transformer, we utilize synthetic data and pseudo-user inputs generated by off-the-shelf StyleGAN and optical flow models, without manual supervision. Quantitative and qualitative evaluations demonstrate the effectiveness of our method over existing methods.
Generative Deformable Radiance Fields for Disentangled Image Synthesis of Topology-Varying Objects
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Wang, Ziyu; Deng, Yu; Yang, Jiaolong; Yu, Jingyi; Tong, Xin; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
3D-aware generative models have demonstrated their superb performance to generate 3D neural radiance fields (NeRF) from a collection of monocular 2D images even for topology-varying object categories. However, these methods still lack the capability to separately control the shape and appearance of the objects in the generated radiance fields. In this paper, we propose a generative model for synthesizing radiance fields of topology-varying objects with disentangled shape and appearance variations. Our method generates deformable radiance fields, which builds the dense correspondence between the density fields of the objects and encodes their appearances in a shared template field. Our disentanglement is achieved in an unsupervised manner without introducing extra labels to previous 3D-aware GAN training. We also develop an effective image inversion scheme for reconstructing the radiance field of an object in a real monocular image and manipulating its shape and appearance. Experiments show that our method can successfully learn the generative model from unstructured monocular images and well disentangle the shape and appearance for objects (e.g., chairs) with large topological variance. The model trained on synthetic data can faithfully reconstruct the real object in a given single image and achieve high-quality texture and shape editing results.
Exploring Contextual Relationships in 3D Cloud Points by Semantic Knowledge Mining
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Chen, Lianggangxu; Lu, Jiale; Cai, Yiqing; Wang, Changbo; He, Gaoqi; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
3D scene graph generation (SGG) aims to predict the class of objects and predicates simultaneously in one 3D point cloud scene with instance segmentation. Since the underlying semantic of 3D point clouds is spatial information, recent ideas of the 3D SGG task usually face difficulties in understanding global contextual semantic relationships and neglect the intrinsic 3D visual structures. To build the global scope of semantic relationships, we first propose two types of Semantic Clue (SC) from entity level and path level, respectively. SC can be extracted from the training set and modeled as the co-occurrence probability between entities. Then a novel Semantic Clue aware Graph Convolution Network (SC-GCN) is designed to explicitly model each SC of which the message is passed in their specific neighbor pattern. For constructing the interactions between the 3D visual and semantic modalities, a visual-language transformer (VLT) module is proposed to jointly learn the correlation between 3D visual features and class label embeddings. Systematic experiments on the 3D semantic scene graph (3DSSG) dataset show that our full method achieves state-of-the-art performance.
Learning Dynamic 3D Geometry and Texture for Video Face Swapping
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Otto, Christopher; Naruniec, Jacek; Helminger, Leonhard; Etterlin, Thomas; Mignone, Graziana; Chandran, Prashanth; Zoss, Gaspard; Schroers, Christopher; Gross, Markus; Gotardo, Paulo; Bradley, Derek; Weber, Romann; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Face swapping is the process of applying a source actor's appearance to a target actor's performance in a video. This is a challenging visual effect that has seen increasing demand in film and television production. Recent work has shown that datadriven methods based on deep learning can produce compelling effects at production quality in a fraction of the time required for a traditional 3D pipeline. However, the dominant approach operates only on 2D imagery without reference to the underlying facial geometry or texture, resulting in poor generalization under novel viewpoints and little artistic control. Methods that do incorporate geometry rely on pre-learned facial priors that do not adapt well to particular geometric features of the source and target faces. We approach the problem of face swapping from the perspective of learning simultaneous convolutional facial autoencoders for the source and target identities, using a shared encoder network with identity-specific decoders. The key novelty in our approach is that each decoder first lifts the latent code into a 3D representation, comprising a dynamic face texture and a deformable 3D face shape, before projecting this 3D face back onto the input image using a differentiable renderer. The coupled autoencoders are trained only on videos of the source and target identities, without requiring 3D supervision. By leveraging the learned 3D geometry and texture, our method achieves face swapping with higher quality than when using offthe- shelf monocular 3D face reconstruction, and overall lower FID score than state-of-the-art 2D methods. Furthermore, our 3D representation allows for efficient artistic control over the result, which can be hard to achieve with existing 2D approaches.
Point-augmented Bi-cubic Subdivision Surfaces
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Karciauskas, Kestutis; Peters, Jorg; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Point-Augmented Subdivision (PAS) replaces complex geometry-dependent guided subdivision, known to yield high-quality surfaces, by explicit subdivision formulas that yield similarly-good limit surfaces and are easy to implement using any subdivision infrastructure: map the control net d augmented by a fixed central limit point C, to a finer net (˜d;C) = M(d;C), where the subdivision matrix M is assembled from the provided stencil Tables. Point-augmented bi-cubic subdivision improves the state of the art so that bi-cubic subdivision surfaces can be used in high-end geometric design: the highlight line distribution for challenging configurations lacks the shape artifacts usually associated with explicit iterative generalized subdivision operators near extraordinary points. Five explicit formulas define Point-augmented bi-cubic subdivision in addition to uniform B-spline knot insertion. Point-augmented bi-cubic subdivision comes in two flavors, either generating a sequence of C2-joined surface rings (PAS2) or C1-joined rings (PAS1) that have fewer pieces.
Fine-Grained Memory Profiling of GPGPU Kernels
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Buelow, Max von; Guthe, Stefan; Fellner, Dieter W.; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Memory performance is a crucial bottleneck in many GPGPU applications, making optimizations for hardware and software mandatory. While hardware vendors already use highly efficient caching architectures, software engineers usually have to organize their data accordingly in order to efficiently make use of these, requiring deep knowledge of the actual hardware. In this paper we present a novel technique for fine-grained memory profiling that simulates the whole pipeline of memory flow and finally accumulates profiling values in a way that the user retains information about the potential region in the GPU program by showing these values separately for each allocation. Our memory simulator turns out to outperform state-of-theart memory models of NVIDIA architectures by a magnitude of 2.4 for the L1 cache and 1.3 for the L2 cache, in terms of accuracy. Additionally, we find our technique of fine grained memory profiling a useful tool for memory optimizations, which we successfully show in case of ray tracing and machine learning applications.
UTOPIC: Uncertainty-aware Overlap Prediction Network for Partial Point Cloud Registration
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Chen, Zhilei; Chen, Honghua; Gong, Lina; Yan, Xuefeng; Wang, Jun; Guo, Yanwen; Qin, Jing; Wei, Mingqiang; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
High-confidence overlap prediction and accurate correspondences are critical for cutting-edge models to align paired point clouds in a partial-to-partial manner. However, there inherently exists uncertainty between the overlapping and non-overlapping regions, which has always been neglected and significantly affects the registration performance. Beyond the current wisdom, we propose a novel uncertainty-aware overlap prediction network, dubbed UTOPIC, to tackle the ambiguous overlap prediction problem; to our knowledge, this is the first to explicitly introduce overlap uncertainty to point cloud registration. Moreover, we induce the feature extractor to implicitly perceive the shape knowledge through a completion decoder, and present a geometric relation embedding for Transformer to obtain transformation-invariant geometry-aware feature representations.With the merits of more reliable overlap scores and more precise dense correspondences, UTOPIC can achieve stable and accurate registration results, even for the inputs with limited overlapping areas. Extensive quantitative and qualitative experiments on synthetic and real benchmarks demonstrate the superiority of our approach over state-of-the-art methods.
Joint Hand and Object Pose Estimation from a Single RGB Image using High-level 2D Constraints
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Song, Hao-Xuan; Mu, Tai-Jiang; Martin, Ralph R.; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Joint pose estimation of human hands and objects from a single RGB image is an important topic for AR/VR, robot manipulation, etc. It is common practice to determine both poses directly from the image; some recent methods attempt to improve the initial poses using a variety of contact-based approaches. However, few methods take the real physical constraints conveyed by the image into consideration, leading to less realistic results than the initial estimates. To overcome this problem, we make use of a set of high-level 2D features which can be directly extracted from the image in a new pipeline which combines contact approaches and these constraints during optimization. Our pipeline achieves better results than direct regression or contactbased optimization: they are closer to the ground truth and provide high quality contact.
TogetherNet: Bridging Image Restoration and Object Detection Together via Dynamic Enhancement Learning
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Wang, Yongzhen; Yan, Xuefeng; Zhang, Kaiwen; Gong, Lina; Xie, Haoran; Wang, Fu Lee; Wei, Mingqiang; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Adverse weather conditions such as haze, rain, and snow often impair the quality of captured images, causing detection networks trained on normal images to generalize poorly in these scenarios. In this paper, we raise an intriguing question - if the combination of image restoration and object detection, can boost the performance of cutting-edge detectors in adverse weather conditions. To answer it, we propose an effective yet unified detection paradigm that bridges these two subtasks together via dynamic enhancement learning to discern objects in adverse weather conditions, called TogetherNet. Different from existing efforts that intuitively apply image dehazing/deraining as a pre-processing step, TogetherNet considers a multi-task joint learning problem. Following the joint learning scheme, clean features produced by the restoration network can be shared to learn better object detection in the detection network, thus helping TogetherNet enhance the detection capacity in adverse weather conditions. Besides the joint learning architecture, we design a new Dynamic Transformer Feature Enhancement module to improve the feature extraction and representation capabilities of TogetherNet. Extensive experiments on both synthetic and real-world datasets demonstrate that our TogetherNet outperforms the state-of-the-art detection approaches by a large margin both quantitatively and qualitatively. Source code is available at https://github.com/yz-wang/TogetherNet.
ShadowPatch: Shadow Based Segmentation for Reliable Depth Discontinuities in Photometric Stereo
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Heep, Moritz; Zell, Eduard; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Photometric stereo is a well-established method with outstanding traits to recover surface details and material properties, like surface albedo or even specularity. However, while the surface is locally well-defined, computing absolute depth by integrating surface normals is notoriously difficult. Integration errors can be introduced and propagated by numerical inaccuracies from inter-reflection of light or non-Lambertian surfaces. But especially ignoring depth discontinuities for overlapping or disconnected objects, will introduce strong distortion artefacts. During the acquisition process the object is lit from different positions and self-shadowing is in general considered as an unavoidable drawback, complicating the numerical estimation of normals. However, we observe that shadow boundaries correlate strongly with depth discontinuities and exploit the visual structure introduced by self-shadowing to create a consistent image segmentation of continuous surfaces. In order to make depth estimation more robust, we deeply integrate photometric stereo with depth-from-stereo. Having obtained a shadow based segmentation of continuous surfaces, allows us to reduce the computational cost for correspondence search in depth-from-stereo. To speed-up computation further, we merge segments into larger meta-segments during an iterative depth optimization. The reconstruction error of our method is equal or smaller than previous work, and reconstruction results are characterized by robust handling of depth-discontinuities, without any smearing artifacts.
Depth-Aware Shadow Removal
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Fu, Yanping; Gai, Zhenyu; Zhao, Haifeng; Zhang, Shaojie; Shan, Ying; Wu, Yang; Tang, Jin; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Shadow removal from a single image is an ill-posed problem because shadow generation is affected by the complex interactions of geometry, albedo, and illumination. Most recent deep learning-based methods try to directly estimate the mapping between the non-shadow and shadow image pairs to predict the shadow-free image. However, they are not very effective for shadow images with complex shadows or messy backgrounds. In this paper, we propose a novel end-to-end depth-aware shadow removal method without using depth images, which estimates depth information from RGB images and leverages the depth feature as guidance to enhance shadow removal and refinement. The proposed framework consists of three components, including depth prediction, shadow removal, and boundary refinement. First, the depth prediction module is used to predict the corresponding depth map of the input shadow image. Then, we propose a new generative adversarial network (GAN) method integrated with depth information to remove shadows in the RGB image. Finally, we propose an effective boundary refinement framework to alleviate the artifact around boundaries after shadow removal by depth cues. We conduct experiments on several public datasets and real-world shadow images. The experimental results demonstrate the efficiency of the proposed method and superior performance against state-of-the-art methods.
EL-GAN: Edge-Enhanced Generative Adversarial Network for Layout-to-Image Generation
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Gao, Lin; Wu, Lei; Meng, Xiangxu; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Although some progress has been made in the layout-to-image generation of complex scenes with multiple objects, object-level generation still suffers from distortion and poor recognizability. We argue that this is caused by the lack of feature encodings for edge information during image generation. In order to solve these limitations, we propose a novel edge-enhanced Generative Adversarial Network for layout-to-image generation (termed EL-GAN). The feature encodings of edge information are learned from the multi-level features output by the generator and iteratively optimized along the generator's pipeline. Two new components are included at each generator level to enable multi-scale learning. Specifically, one is the edge generation module (EGM), which is responsible for converting the output of the multi-level features by the generator into images of different scales and extracting their edge maps. The other is the edge fusion module (EFM), which integrates the feature encodings refined from the edge maps into the subsequent image generation process by modulating the parameters in the normalization layers. Meanwhile, the discriminator is fed with frequency-sensitive image features, which greatly enhances the generation quality of the image's high-frequency edge contours and low-frequency regions. Extensive experiments show that EL-GAN outperforms the state-of-the-art methods on the COCO-Stuff and Visual Genome datasets. Our source code is available at https://github.com/Azure616/EL-GAN.
Resolution-switchable 3D Semantic Scene Completion
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Luo, Shoutong; Sun, Zhengxing; Sun, Yunhan; Wang, Yi; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Semantic scene completion (SSC) aims to recover the complete geometric structure as well as the semantic segmentation results from partial observations. Previous works could only perform this task at a fixed resolution. To handle this problem, we propose a new method that can generate results at different resolutions without redesigning and retraining. The basic idea is to decouple the direct connection between resolution and network structure. To achieve this, we convert feature volume generated by SSC encoders into a resolution adaptive feature and decode this feature via point. We also design a resolution-adapted point sampling strategy for testing and a category-based point sampling strategy for training to further handle this problem. The encoder of our method can be replaced by existing SSC encoders. We can achieve better results at other resolutions while maintaining the same accuracy as the original resolution results. Code and data are available at https://github.com/lstcutong/ReS-SSC.
Efficient and Stable Simulation of Inextensible Cosserat Rods by a Compact Representation
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Zhao, Chongyao; Lin, Jinkeng; Wang, Tianyu; Bao, Hujun; Huang, Jin; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
Piecewise linear inextensible Cosserat rods are usually represented by Cartesian coordinates of vertices and quaternions on the segments. Such representations use excessive degrees of freedom (DOFs), and need many additional constraints, which causes unnecessary numerical difficulties and computational burden for simulation. We propose a simple yet compact representation that exactly matches the intrinsic DOFs and naturally satisfies all such constraints. Specifically, viewing a rod as a chain of rigid segments, we encode its shape as the Cartesian coordinates of its root vertex, and use axis-angle representation for the material frame on each segment. Under our representation, the Hessian of the implicit time-stepping has special non-zero patterns. Exploiting such specialties, we can solve the associated linear equations in nearly linear complexity. Furthermore, we carefully designed a preconditioner, which is proved to be always symmetric positive-definite and accelerates the PCG solver in one or two orders of magnitude compared with the widely used block-diagonal one. Compared with other technical choices including Super-Helices, a specially designed compact representation for inextensible Cosserat rods, our method achieves better performance and stability, and can simulate an inextensible Cosserat rod with hundreds of vertices and tens of collisions in real time under relatively large time steps.
MINERVAS: Massive INterior EnviRonments VirtuAl Synthesis
(The Eurographics Association and John Wiley & Sons Ltd., 2022) Ren, Haocheng; Zhang, Hao; Zheng, Jia; Zheng, Jiaxiang; Tang, Rui; Huo, Yuchi; Bao, Hujun; Wang, Rui; Umetani, Nobuyuki; Wojtan, Chris; Vouga, Etienne
With the rapid development of data-driven techniques, data has played an essential role in various computer vision tasks. Many realistic and synthetic datasets have been proposed to address different problems. However, there are lots of unresolved challenges: (1) the creation of dataset is usually a tedious process with manual annotations, (2) most datasets are only designed for a single specific task, (3) the modification or randomization of the 3D scene is difficult, and (4) the release of commercial 3D data may encounter copyright issue. This paper presents MINERVAS, a Massive INterior EnviRonments VirtuAl Synthesis system, to facilitate the 3D scene modification and the 2D image synthesis for various vision tasks. In particular, we design a programmable pipeline with Domain-Specific Language, allowing users to select scenes from the commercial indoor scene database, synthesize scenes for different tasks with customized rules, and render various types of imagery data, such as color images, geometric structures, semantic labels. Our system eases the difficulty of customizing massive scenes for different tasks and relieves users from manipulating fine-grained scene configurations by providing user-controllable randomness using multilevel samplers. Most importantly, it empowers users to access commercial scene databases with millions of indoor scenes and protects the copyright of core data assets, e.g., 3D CAD models. We demonstrate the validity and flexibility of our system by using our synthesized data to improve the performance on different kinds of computer vision tasks. The project page is at https://coohom.github.io/MINERVAS.

Browse

Browsing 41-Issue 7 by Issue Date

Results Per Page

Sort Options