Italian Chapter Conference 2024 - Smart Tools and Apps in Graphics

Permanent URI for this collection

https://diglib.eg.org/handle/10.2312/3607055

Browse

Now showing 1 - 20 of 24

Mesh Comparison Using Regular Grids
(The Eurographics Association, 2024) Kaye, Patrizia; Ivrissimtzis, Ioannis; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
A symmetric grid-based approach to mesh comparison is proposed, providing intuitive visual results alongside an objective measure of the local differences between meshes. The difference function is defined on the nodes of a regular 3D lattice, making it suitable as input for a variety of analysis algorithms. The visual results are compared and comparable to the Metro tool.
TACO: a Benchmark for Connectivity-invariance in Shape Correspondence
(The Eurographics Association, 2024) Pedico, Simone; Melzi, Simone; Maggioli, Filippo; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
In real-world scenarios, a major limitation for shape-matching datasets is represented by having all the meshes of the same subject share their connectivity across different poses. Specifically, similar connectivities could provide a significant bias for shape matching algorithms, simplifying the matching process and potentially leading to correspondences based on the recurring triangle patterns rather than geometric correspondences between mesh parts. As a consequence, the resulting correspondence may be meaningless, and the evaluation of the algorithm may be misled. To overcome this limitation, we introduce TACO, a new dataset where meshes representing the same subject in different poses do not share the same connectivity, and we compute new ground truth correspondences between shapes. We extensively evaluate our dataset to ensure that ground truth isometries are properly preserved. We also use our dataset for validating state-of-the-art shape-matching algorithms, verifying a degradation in performance when the connectivity gets altered.
S4A: Scalable Spectral Statistical Shape Analysis
(The Eurographics Association, 2024) Maccarone, Francesca; Longari, Giorgio; Viganò, Giulio; Peruzzo, Denis; Maggioli, Filippo; Melzi, Simone; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
Statistical shape analysis is a crucial technique for studying deformations within collections of shapes, particularly in the field of Medical Imaging. However, the high density of meshes typically used to represent medical data poses a challenge for standard geometry processing tools due to their limited efficiency. While spectral approaches offer a promising solution by effectively handling high-frequency variations inherent in such data, their scalability is questioned by their need to solve eigendecompositions of large sparse matrices. In this paper, we introduce S4A, a novel and efficient method based on spectral geometry processing, that addresses these issues with a low computational cost. It operates in four stages: (i) establishing correspondences between each pair of shapes in the collection, (ii) defining a common latent space to encode deformations across the entire collection, (iii) computing statistical quantities to identify, highlight, and measure the most representative variations within the collection, and iv) performing information transfer from labeled data to large collections of shapes. Unlike previous methods, S4A provides a highly efficient solution across all stages of the process.We demonstrate the advantages of our approach by comparing its accuracy and computational efficiency to existing pipelines, and by showcasing the comprehensive statistical insights that can be derived from applying our method to a collection of medical data.
The use of Virtual Reality in preserving and reactivating immersive audio art installations: the case of Dissonanze Circolari by Roberto Taroni
(The Eurographics Association, 2024) Russo, Alessandro; Fayyaz, Nikoo; Franceschini, Andrea; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
Interactive multimedia artworks pose unique challenges for their preservation, such as the obsolescence of original components, software, and playback devices, and other issues related to their interactive and time-based nature. The Centro di Sonologia Computazionale (CSC) of the University of Padova developed the Multilevel Dynamic Preservation (MDP) model, which aims at ensuring the long-term preservation of multimedia artworks by treating them as dynamic objects. Reactivation is a fundamental step for allowing their preservation, and, among various reactivation strategies, Virtual Reality (VR) provides a unique opportunity to recreate the immersive experience while still maintaining the concept of the original artwork. The CSC started to work together with Italian artist Roberto Taroni, a central figure in the experimental scenario, who often combined music and visual arts in his works. This contribution concerns the reactivation in VR of Roberto Taroni's artwork ''Dissonanze Circolari'' from 1999. This installation featured a room with 16 speakers, each one playing a fragment of Beethoven's piano performance, Op.111, executed by different musicians, creating a dissonance-based immersive experience. The reactivation was carried out using the documentation provided by the artist and the audio samples from the original installation. The VR environment was created using the game engine Unreal Engine 5. This reactivation approach allows to maximize access to the artwork, providing new information for curators, scholars, and art enthusiasts.
To What Extent Are Existing Volume Mapping Algorithms Practically Useful?
(The Eurographics Association, 2024) Meloni, Federico; Cherchi, Gianmarco; Scateni, Riccardo; Livesu, Marco; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
Mappings between geometric domains play a crucial role in many algorithms in geometry processing and are heavily used in various applications. Despite the significant progress made in recent years, the challenge of reliably mapping two volumes still needs to be solved to an extent that is satisfactory for practical applications. This paper offers a review of provably robust volume mapping algorithms, evaluating their performances in terms of time, memory and ability to generate a correct result both with exact and inexact numerical models. We have chosen and evaluated the two most advanced methods currently available, using a state-of-the-art benchmark designed specifically for this type of analysis. We are sharing both the statistical results and specific volume mappings with the community, which can be utilized by future algorithms for direct comparative analysis. We also provide utilities for reading, writing, and validating volume maps encoded with exact rational coordinates, which is the natural form of output for robust algorithms in this class. All in all, this benchmark offers a neat overview of where do we stand in terms of ability to reliably solve the volume mapping problem, also providing practical data and tools that enable the community to compare future algorithmic developments without the need to re-run existing methods.
A Mixed Reality Application for Multi-Floor Building Evacuation Drills using Real-Time Pathfinding and Dynamic 3D Modeling
(The Eurographics Association, 2024) Manfredi, Gilda; Capece, Nicola; Carlo, Rosario Pio Di; Erra, Ugo; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
In modern high-rise buildings, complex layouts and frequent structural changes often hinder emergency evacuation. Traditional evacuation plans, usually 2D diagrams, do not provide real-time guidance and are difficult for occupants to interpret. We propose a Mixed Reality (MR) application to address these challenges in real-time evacuation in multi-floor buildings. This application was developed on Meta Quest 3, chosen for its status as one of the best low-cost eXtended Reality (XR) headsets and a popular standalone Head-Mounted Display (HMD). Our system allows users to rapidly rescan and update building models, ensuring that evacuation guidance is always up-to-date. The proposed approach overcomes the Meta Quest 3 API's limitation of scanning only 15 rooms. It extends its capability by saving room data externally and using spatial anchors to maintain accurate alignment with the physical environment. Additionally, the application integrates Dijkstra's algorithm to dynamically calculate optimal escape routes based on the user's real-time location. A preliminary evaluation study demonstrates the application's effectiveness in enhancing situational awareness and enabling users to stay mentally sharp, highlighting its potential to improve decision-making and emergency response in dynamic building environments significantly.
Persistent Homology vs. Learning Methods: A Comparative Study in Limited Data Scenarios
(The Eurographics Association, 2024) Di Via, Andrea; Di Via, Roberto; Fugacci, Ulderico; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
This exploratory study compares persistent homology methods with traditional machine learning and deep learning techniques for label-efficient classification. We propose pure topological approaches, including persistence thresholding and Bottleneck distance classification, and explore hybrid methods combining persistent homology with machine learning. These are evaluated against conventional machine learning algorithms and deep neural networks on two binary classification tasks: surface crack detection and malaria cell identification. We assess performance across various number of samples per class, ranging from 1 to 500. Our study highlights the efficacy of persistent homology-based methods in low-data scenarios. Using the Bottleneck distance approach, we achieve 95.95% accuracy in crack detection and 93.11% in malaria diagnosis with only one labeled sample per class. These results outperform the best performance from machine learning models, which achieves 69.40% and 39.75% accuracy, respectively, and deep learning models, which attains up to 95.96% in crack detection and 62.72% in malaria diagnosis. This demonstrates the superior performance of topological methods in classification tasks with few labeled data. Hybrid approaches demonstrate enhanced performance as the number of labeled samples increases, effectively leveraging topological features to boost classification accuracy. This study highlights the robustness of topological methods in extracting meaningful features from limited data, offering promising directions for efficient, label-conserving classification strategies. The results underscore the worth of persistent homology, both as a standalone tool and in combination with machine learning, particularly in domains where labeled data scarcity challenges traditional deep learning approaches.
Localized Gaussians as Self-Attention Weights for Point Clouds Correspondence
(The Eurographics Association, 2024) Riva, Alessandro; Raganato, Alessandro; Melzi, Simone; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
Current data-driven methodologies for point cloud matching demand extensive training time and computational resources, presenting significant challenges for model deployment and application. In the point cloud matching task, recent advancements with an encoder-only Transformer architecture have revealed the emergence of semantically meaningful patterns in the attention heads, particularly resembling Gaussian functions centered on each point of the input shape. In this work, we further investigate this phenomenon by integrating these patterns as fixed attention weights within the attention heads of the Transformer architecture. We evaluate two variants: one utilizing predetermined variance values for the Gaussians, and another where the variance values are treated as learnable parameters. Additionally we analyze the performances on noisy data and explore a possible way to improve robustness to noise. Our findings demonstrate that fixing the attention weights not only accelerates the training process but also enhances the stability of the optimization. Furthermore, we conducted an ablation study to identify the specific layers where the infused information is most impactful and to understand the reliance of the network on this information.
Surface Reconstruction from Silhouette and Laser Scanners as a Positive-Unlabeled Learning Problem
(The Eurographics Association, 2024) Gottardo, Mario; Pistellato, Mara; Bergamasco, Filippo; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
Typical 3D reconstruction pipelines employ a combination of line-laser scanners and robotic actuators to produce a point cloud and then proceed with surface reconstruction. In this work we propose a new technique to learn an Implicit Neural Representation (INR) of a 3D shape S without directly observing points on its surface. We just assume being able to determine whether a 3D point is exterior to S (e.g. observing if the projection falls outside the silhouette or detecting on which side of the laser line the point is). In this setting, we cast the reconstruction process as a Positive-Unlabelled learning problem where sparse 3D points, sampled according to a distribution depending on the INR's local gradient, have to be classified as being interior or exterior to S. These points, are used to train the INR in an iterative way so that its zero-crossing converges to the boundary of the shape. Preliminary experiments performed on a synthetic dataset demonstrates the advantages of the approach.
A Study on the Use of High Dynamic Range Imaging for Gaussian Splatting Methods: Are 8 bits Enough?
(The Eurographics Association, 2024) Piras, Valentina; Bonatti, Amedeo Franco; Maria, Carmelo De; Cignoni, Paolo; Banterle, Francesco; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
The recent rise of Neural Radiance Fields (NeRFs)-like methods has revolutionized high-fidelity scene reconstruction, with 3D Gaussian Splatting (3DGS) standing out for its ability to generate photorealistic images while maintaining fast, efficient rendering. 3DGS delivers high-fidelity representations of complex scenes at any scale (from very small objects to entire cities), accurately capturing geometry, materials, and lighting, while meeting the need for fast and efficient rendering-crucial for applications requiring real-time performance. Although High Dynamic Range (HDR) technology, which enables the capture of comprehensive real-world lighting information, has been used in novel view synthesis, several questions remain unanswered. For example, does HDR improve the overall quality of reconstruction? Are 8 bits enough? Can tone mapped images be a balanced compromise regarding quality and details? To answer such questions, in this work, we study the application of HDR technology on the 3DGS method for acquiring real-world scenes.
DDD: Deep indoor panoramic Depth estimation with Density maps consistency
(The Eurographics Association, 2024) Pintore, Giovanni; Agus, Marco; Signoroni, Alberto; Gobbetti, Enrico; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
We introduce a novel deep neural network for rapid and structurally consistent monocular 360◦ depth estimation in indoor environments. The network infers a depth map from a single gravity-aligned or gravity-rectified equirectangular image of the environment, ensuring that the predicted depth aligns with the typical depth distribution and features of cluttered interior spaces, which are usually enclosed by walls, ceilings, and floors. By leveraging the distinct characteristics of vertical and horizontal features in man-made indoor environments, we introduce a lean network architecture that employs gravity-aligned feature flattening and specialized vision transformers that utilize the input's omnidirectional nature, without segmentation into patches and positional encoding. To enhance the structural consistency of the predicted depth, we introduce a new loss function that evaluates the consistency of density maps by projecting points derived from the inferred depth map onto horizontal and vertical planes. This lightweight architecture has very small computational demands, provides greater structural consistency than competing methods, and does not require the explicit imposition of strong structural priors.
Meshtrics: Objective Quality Assessment of Textured 3D Meshes for 3D Reconstruction
(The Eurographics Association, 2024) Madeira, Tiago; Oliveira, Miguel; Dias, Paulo; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
In the context of 3D reconstruction, the pursuit of photorealistic models requires precise, objective quality evaluation methods. In this work, we investigate several potential objective metrics for the quality assessment of textured 3D meshes by evaluating their correlation with human perception of visual quality. We conduct experiments using a publicly available, subjectively-rated database of textured 3D meshes containing various types of geometry and texture distortions. Based on these experiments, we discuss the characteristics and limitations of the evaluated metrics. Notably, image-based metrics demonstrated the strongest correlation with subjective scores in most tested scenarios, suggesting that 2D image metrics are reliable predictors of 3D model visual quality. We then introduce a framework designed to facilitate the analysis of various characteristics of 3D models and their fidelity, with a particular focus on image-based metrics leveraging photographs of real-world environments as reference. Our toolkit streamlines the generation of renders and the application of quality metrics, enabling manual annotation in 2D and 3D spaces, while incorporating an automatic alignment refinement step for precise registration of reference photographs. We evaluate the proposed approach using a dataset generated through the 3D reconstruction of a complex indoor environment. Our experiments support the efficacy of the solution in benchmarking 3D reconstruction results, enabling timely informed adjustments to the reconstruction methodology. Source code is available at https://github.com/tiagomfmadeira/Meshtrics.
A Simple Improvement to PIP-Net for Medical Image Anomaly Detection
(The Eurographics Association, 2024) Kobayashi, Yuki; Yamaguchi, Yasushi; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
The application of AI technology in domains requiring decision accountability, such as healthcare, has increased the demand for model interpretability. The part-prototype model is a well-established interpretable approach for image recognition, with PIP-Net demonstrating strong classification performance and high interpretability in multiclass classification tasks. However, PIP-Net assumes the presence of class-specific prototypes. This assumption does not hold for tasks like anomaly detection, where no local features are exclusive to the normal class. To address this, we propose an architecture that learns only the scores corresponding to the anomaly class for each prototype. This approach is based on more reasonable assumptions for anomaly detection than PIP-Net and enables concise inference using fewer prototypes. Evaluation of this approach using the MURA dataset, a large dataset of bone X-rays, revealed that the proposed architecture achieved better anomaly detection performance than the original PIP-Net with fewer prototypes.
Evaluating AI-based static stereoscopic rendering of indoor panoramic scenes
(The Eurographics Association, 2024) Jashari, Sara; Tukur, Muhammad; Boraey, Yehia; Alzubaidi, Mahmood; Pintore, Giovanni; Gobbetti, Enrico; Villanueva, Alberto Jaspe; Schneider, Jens; Fetais, Noora; Agus, Marco; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
Panoramic imaging has recently become an extensively used technology for the representation and exploration of indoor environments. Panoramic cameras generate omnidirectional images that provide a comprehensive 360-degree view, making them a valuable tool for applications such as virtual tours in real estate, architecture, and cultural heritage. However, constructing truly immersive experiences from panoramic images presents challenges, particularly in generating panoramic stereo pairs that offer consistent depth cues and visual comfort across all viewing directions. Traditional stereo-imaging techniques do not directly apply to spherical panoramic images, requiring complex processing to avoid artifacts that can disrupt immersion. To address these challenges, various imaging and processing technologies have been developed, including multi-camera systems and computational methods that generate stereo images from a single panoramic input. Although effective, these solutions often involve complicated hardware and processing pipelines. Recently, deep learning approaches have emerged, enabling novel view generation from single panoramic images. While these methods show promise, they have not yet been thoroughly evaluated in practical scenarios. This paper presents a series of evaluation experiments aimed at assessing different technologies for creating static stereoscopic environments from omnidirectional imagery, with a focus on 3DOF immersive exploration. A user study was conducted using a WebXR prototype and a Meta Quest 3 headset to quantitatively and qualitatively compare traditional image composition techniques with AI-based methods. Our results indicate that while traditional methods provide a satisfactory level of immersion, AI-based generation is nearing a quality level suitable for deployment in web-based environments.
Advancing Environmental Modeling with Unstructured Meshes: Current Research and Development
(The Eurographics Association, 2024) Miola, Marianna; Cabiddu, Daniela; Mortara, Michela; Pittaluga, Simone; Sorgente, Tommaso; Zuccolini, Marino Vetuschi; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
Modeling the distribution of environmental variables across spatial domains presents significant challenges. Geostatistics offers a robust set of tools for accurately predicting values and associated uncertainties at unsampled locations, accounting for spatial correlations. However, these tools are often constrained by their reliance on structured domain representations, limiting their flexibility in modeling complex or irregular structures. By exploring the use of unstructured meshes, we can achieve a more efficient and accurate representation of localized phenomena, thereby enhancing our ability to model spatial patterns. Our current efforts are focused on integrating unstructured meshes into the geostatistical modeling pipeline, encompassing everything from mesh generation (and possibly refinement) to their application in stochastic simulation and the segmentation of the domain into regions where the distribution of variables is homogeneous. Preliminary results are promising, demonstrating the potentialities of this innovative approach.
Semantic Stylization and Shading via Segmentation Atlas utilizing Deep Learning Approaches
(The Eurographics Association, 2024) Sinha, Saptarshi Neil; Kühn, Paul Julius; Rojtberg, Pavel; Graf, Holger; Kuijper, Arjan; Weinmann, Michael; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
We present a novel hybrid approach for semantic stylization of surface materials of 3D models while preserving shading. Based on a hybrid approach that builds on directly applying style transfer on the object surface obtained by learning-based or traditional methods such as 3D scanners or structured light systems, thereby overcoming artifacts like halos, ghosting or lacking quality of the geometric representation produced by other 3D stylization methods. For this purpose, our methods involves (i) the initial generation of a segmentation map parameterized over the object surface inferred based on a deep-learning-based foundation model to guide the stylization and shading of different regions of the 3D model, and (ii) a subsequent 2D style transfer that allows the exchange or stylization of surface materials in high quality. By delivering high-quality semantic perceptive reconstructions in a shorter timeframe than current approaches using manual 3D segmentation and stylization, our approach holds significant potential for various application scenarios including creative design, architecture and cultural heritage.
VISPI: Virtual Staging Pipeline for Single Indoor Panoramic Images
(The Eurographics Association, 2024) Shah, Uzair; Jashari, Sara; Tukur, Muhammad; Pintore, Giovanni; Gobbetti, Enrico; Schneider, Jens; Agus, Marco; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
Taking a 360◦ image is the quickest and most cost-effective way to capture the entire environment around the viewer in a form that can be directly exploited for creating immersive content [PBAG23]. In this work, we introduce novel solutions for the virtual staging of indoor environments, supporting automatic emptying, object insertion, and relighting. Our solution, dubbed VISPI (Virtual Staging Pipeline for Single Indoor Panoramic Images), integrates data-driven processing components, that take advantage of the analysis of knowledge learned from massive data collections, within a real-time rendering and editing system, allowing for interactive restaging of indoor scenes. Key components of VISPI include: i) a holistic architecture based on a multi-task vision transformer for extracting geometry, semantic, and material information from a single panoramic image, ii) a lighting model based on spherical Gaussians, iii) a method for lighting estimation from the geometric, semantic, and material signals, and iv) a real-time editing and rendering component. The proposed framework provides an interactive and user-friendly solution for creating immersive visualizations of indoor spaces. We present a preliminary assessment of VISPI using a synthetic dataset - Structured3D - and demonstrate its application in creating restaged indoor scenes.
Disk-NeuralRTI: Optimized NeuralRTI Relighting through Knowledge Distillation
(The Eurographics Association, 2024) Dulecha, Tinsae Gebrechristos; Righetto, Leonardo; Pintus, Ruggero; Gobbetti, Enrico; Giachetti, Andrea; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
Relightable images created from Multi-Light Image Collections (MLICs) are among the most employed models for interactive object exploration in cultural heritage (CH). In recent years, neural representations have been shown to produce higherquality images at similar storage costs to the more classic analytical models such as Polynomial Texture Maps (PTM) or Hemispherical Harmonics (HSH). However, the Neural RTI models proposed in the literature perform the image relighting with decoder networks with a high number of parameters, making decoding slower than for classical methods. Despite recent efforts targeting model reduction and multi-resolution adaptive rendering, exploring high-resolution images, especially on high-pixelcount displays, still requires significant resources and is only achievable through progressive rendering in typical setups. In this work, we show how, by using knowledge distillation from an original (teacher) Neural RTI network, it is possible to create a more efficient RTI decoder (student network). We evaluated the performance of the network compression approach on existing RTI relighting benchmarks, including both synthetic and real datasets, and on novel acquisitions of high-resolution images. Experimental results show that we can keep the student prediction close to the teacher with up to 80% parameter reduction and almost ten times faster rendering when embedded in an online viewer.
Disambiguating Flat Spots in Digital Elevation Models
(The Eurographics Association, 2024) Rocca, Luigi; Puppo, Enrico; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
We consider Digital Elevation Models (DEMs) encoded as regular grids of discrete elevation data samples. When the terrain's slope is low relative to the dataset's vertical resolution, the DEM may contain flat spots: connected areas where all points share the same elevation. Flat spots can hinder certain analyses, such as topological characterization or drainage network computations. We discuss the application of Morse-Smale theory to grids and the disambiguation of flat spots. Specifically, we show how to characterize the topology of flat spots and symbolically perturb their elevation data to make the DEM compatible with Morse-Smale theory while preserving its topological properties. Our approach applies equivalently to three different surface models derived from the DEM grid: the step model, the bilinear model, and a piecewise-linear model based on the quincunx lattice.
Peek-a-bot: learning through vision in Unreal Engine
(The Eurographics Association, 2024) Pietra, Daniele Della; Garau, Nicola; Conci, Nicola; Granelli, Fabrizio; Caputo, Ariel; Garro, Valeria; Giachetti, Andrea; Castellani, Umberto; Dulecha, Tinsae Gebrechristos
Humans learn to navigate and interact with their surroundings through their senses, particularly vision. Ego-vision has lately become a significant focus in computer vision, enabling neural networks to learn from first-person data effectively, as we humans do. Supervised or self-supervised learning of depth, object location and segmentation maps through deep networks has shown considerable success in recent years. On the other hand, reinforcement learning (RL) has been focusing on learning from different kinds of sensing data, such as rays, collisions, distances, and other types of observations. In this paper, we merge the two approaches, providing a complete pipeline to train reinforcement learning agents inside virtual environments, only relying on vision, eliminating the need for traditional RL observations. We demonstrate that visual stimuli, if encoded by a carefully designed vision encoder, can provide informative observations, thus replacing ray-based approaches and drastically simplifying the reward shaping typical of classical RL. Our method is fully implemented inside Unreal Engine 5, from the realtime inference of visual features to the online training of the agents' behaviour using the Proximal Policy Optimization (PPO) algorithm. To the best of our knowledge, this is the first in-engine solution targeting video games and simulation, enabling game developers to easily train vision-based RL agents without writing a single line of code. All the code, complete experiments and analysis will be available at https://mmlab-cv.github.io/Peek-a-bot/.

Browse

Browsing Italian Chapter Conference 2024 - Smart Tools and Apps in Graphics by Issue Date

Results Per Page

Sort Options