Machine Learning Methods in Visualisation for Big Data
Permanent URI for this community
Browse
Browsing Machine Learning Methods in Visualisation for Big Data by Title
Now showing 1 - 20 of 28
Results Per Page
Sort Options
Item Controllably Sparse Perturbations of Robust Classifiers for Explaining Predictions and Probing Learned Concepts(The Eurographics Association, 2021) Roberts, Jay; Tsiligkaridis, Theodoros; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoExplaining the predictions of a deep neural network (DNN) in image classification is an active area of research. Many methods focus on localizing pixels, or groups of pixels, which maximize a relevance metric for the prediction. Others aim at creating local "proxy" explainers which aim to account for an individual prediction of a model. We aim to explore "why" a model made a prediction by perturbing inputs to robust classifiers and interpreting the semantically meaningful results. For such an explanation to be useful for humans it is desirable for it to be sparse; however, generating sparse perturbations can computationally expensive and infeasible on high resolution data. Here we introduce controllably sparse explanations that can be efficiently generated on higher resolution data to provide improved counter-factual explanations. Further we use these controllably sparse explanations to probe what the robust classifier has learned. These explanations could provide insight for model developers as well as assist in detecting dataset bias.Item DimVis: Interpreting Visual Clusters in Dimensionality Reduction With Explainable Boosting Machine(The Eurographics Association, 2024) SALMANIAN, PARISA; Chatzimparmpas, Angelos; Karaca, Ali Can; Martins, Rafael M.; Archambault, Daniel; Nabney, Ian; Peltonen, JaakkoDimensionality Reduction (DR) techniques such as t-SNE and UMAP are popular for transforming complex datasets into simpler visual representations. However, while effective in uncovering general dataset patterns, these methods may introduce artifacts and suffer from interpretability issues. This paper presents DimVis, a visualization tool that employs supervised Explainable Boosting Machine (EBM) models (trained on user-selected data of interest) as an interpretation assistant for DR projections. Our tool facilitates high-dimensional data analysis by providing an interpretation of feature relevance in visual clusters through interactive exploration of UMAP projections. Specifically, DimVis uses a contrastive EBM model that is trained in real time to differentiate between the data inside and outside a cluster of interest. Taking advantage of the inherent explainable nature of the EBM, we then use this model to interpret the cluster itself via single and pairwise feature comparisons in a ranking based on the EBM model's feature importance. The applicability and effectiveness of DimVis are demonstrated via a use case and a usage scenario with real-world data. We also discuss the limitations and potential directions for future research.Item Exploration of Preference Models using Visual Analytics(The Eurographics Association, 2024) Buchmüller, Raphael; Zymla, Mark-Matthias; Keim, Daniel; Butt, Miriam; Sevastjanova, Rita; Archambault, Daniel; Nabney, Ian; Peltonen, JaakkoThe identification and integration of diverse viewpoints are key to sound decision-making. This paper introduces a novel Visual Analytics technique aimed at summarizing and comparing perspectives derived from established preference models. We use 2D projection and interactive visualization to explore user models based on subjective preference labels and extracted linguistic features. We then employ a pie-chart-like exploration design to enable the aggregation and simultaneous exploration of diverse preference groupings. The approach allows rotation and slicing interactions of the visual space. We demonstrate the technique's applicability and effectiveness through a use case in exploring the complex landscape of argument preferences. We highlight our designs potential to enhance decision-making processes within diverging preferences through Visual Analytics.Item Improving the Sensitivity of Statistical Testing for Clusterability with Mirrored-Density Plots(The Eurographics Association, 2020) Thrun, Michael C.; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoFor many applications, it is crucial to decide if a dataset possesses cluster structures. This property is called clusterability and is usually investigated with the usage of statistical testing. Here, it is proposed to extend statistical testing with the Mirrored- Density plot (MDplot). The MDplot allows investigating the distributions of many variables with automatic sampling in case of large datasets. Statistical testing of clusterability is compared with MDplots of the 1st principal component and the distance distribution of data. Contradicting results are evaluated with topographic maps of cluster structures derived from planar projections using the generalized U-Matrix technique. A collection of artificial and natural datasets is used for the comparison. This collection is specially designed to have a variety of clustering problems that any algorithm should be able to handle. The results demonstrate that the MDplot improves statistical testing but, even then, almost touching cluster structures of low intercluster distances without a predominant direction of variance remain challenging.Item Interactive Dense Pixel Visualizations for Time Series and Model Attribution Explanations(The Eurographics Association, 2023) Schlegel, Udo; Keim, Daniel; Archambault, Daniel; Nabney, Ian; Peltonen, JaakkoThe field of Explainable Artificial Intelligence (XAI) for Deep Neural Network models develops significantly, offering numerous techniques to extract explanations from models. However, evaluating explanations is often not trivial, and differences in applied metrics can be subtle, especially with non-intelligible data. Thus, there is a need for visualizations tailored to explore explanations for domains with such data, e.g., time series. We propose DAVOTS, an interactive visual analytics approach to explore raw time series data, activations of neural networks, and attributions in a dense-pixel visualization to gain insights into the data, models' decisions, and explanations. To further support users in exploring large datasets, we apply clustering approaches to the visualized data domains to highlight groups and present ordering strategies for individual and combined data exploration to facilitate finding patterns. We visualize a CNN trained on the FordA dataset to demonstrate the approach.Item Interpreting Black-Box Semantic Segmentation Models in Remote Sensing Applications(The Eurographics Association, 2019) Janik, Adrianna; Sankaran, Kris; Ortiz, Anthony; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoIn the interpretability literature, attention is focused on understanding black-box classifiers, but many problems ranging from medicine through agriculture and crisis response in humanitarian aid are tackled by semantic segmentation models. The absence of interpretability for these canonical problems in computer vision motivates this study. In this study we present a usercentric approach that blends techniques from interpretability, representation learning, and interactive visualization. It allows to visualize and link latent representation to real data instances as well as qualitatively assess strength of predictions. We have applied our method to a deep learning model for semantic segmentation, U-Net, in a remote sensing application of building detection. This application is of high interest for humanitarian crisis response teams that rely on satellite images analysis. Preliminary results shows utility in understanding semantic segmentation models, demo presenting the idea is available online.Item Introducing Fairness in Graph Visualization via Gradient Descent(The Eurographics Association, 2024) Hong, Seok-Hee; Liotta, Giuseppe; Montecchiani, Fabrizio; Nöllenburg, Martin; Piselli, Tommaso; Archambault, Daniel; Nabney, Ian; Peltonen, JaakkoMotivated by the need for decision-making systems that avoid bias and discrimination, the concept of fairness recently gained traction in the broad field of artificial intelligence, stimulating new research also within the information visualization community. In this paper, we introduce a notion of fairness in network visualization, specifically for straight-line drawings of graphs, a foundational paradigm in the field. We empirically investigate the following research questions: (i) What is the price of incorporating fairness constraints in straight-line drawings? (ii) How unfair is a straight-line drawing that does not optimize fairness as a primary objective? To tackle these questions, we implement an algorithm based on gradient-descent that can compute straight-line drawings of graphs by optimizing multi-objective functions. We experimentally show that one can significantly increase the fairness of a drawing by paying a relatively small amount in terms of reduced readability.Item Machine Learning Methods in Visualisation for Big Data 2018: Frontmatter(The Eurographics Association, 2018) Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel; Ian Nabney and Jaakko Peltonen and Daniel ArchambaultItem MLVis 2019: Frontmatter(The Eurographics Association, 2019) Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoItem MLVis 2020: Frontmatter(The Eurographics Association, 2020) Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoItem MLVis 2021: Frontmatter(The Eurographics Association, 2021) Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoItem MLVis 2022: Frontmatter(The Eurographics Association, 2022) Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel; Nabney, Ian; Peltonen, JaakkoItem MLVis 2023: Frontmatter(The Eurographics Association, 2023) Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel; Nabney, Ian; Peltonen, JaakkoItem MLVis 2024: Frontmatter(The Eurographics Association, 2024) Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel; Nabney, Ian; Peltonen, JaakkoItem ModelSpeX: Model Specification Using Explainable Artificial Intelligence Methods(The Eurographics Association, 2020) Schlegel, Udo; Cakmak, Eren; Keim, Daniel A.; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoExplainable artificial intelligence (XAI) methods aim to reveal the non-transparent decision-making mechanisms of black-box models. The evaluation of insight generated by such XAI methods remains challenging as the applied techniques depend on many factors (e.g., parameters and human interpretation). We propose ModelSpeX, a visual analytics workflow to interactively extract human-centered rule-sets to generate model specifications from black-box models (e.g., neural networks). The workflow enables to reason about the underlying problem, to extract decision rule sets, and to evaluate the suitability of the model for a particular task. An exemplary usage scenario walks an analyst trough the steps of the workflow to show the applicability.Item On KDE-based Brushing in Scatterplots and how it Compares to CNN-based Brushing(The Eurographics Association, 2019) Fan, Chaoran; Hauser, Helwig; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoIn this paper, we investigate to which degree the human should be involved into the model design and how good the empirical model can be with more careful design. To find out, we extended our previously published Mahalanobis brush (the best current empirical model in terms of accuracy for brushing points in a scatterplot) by further incorporating the data distribution information that is captured by the kernel density estimation (KDE). Based on this work, we then include a short discussion between the empirical model, designed in detail by an expert and the deep learning-based model that is learned from user data directly.Item Panning for Insight: Amplifying Insight through Tight Integration of Machine Learning, Data Mining, and Visualization(The Eurographics Association, 2018) Karer, Benjamin; Scheler, Inga; Hagen, Hans; Ian Nabney and Jaakko Peltonen and Daniel ArchambaultWith the rapid progress made in Data Mining, Visualization, and Machine Learning during the last years, combinations of these methods have gained increasing interest. This paper summarizes ideas behind ongoing work on combining methods of these three domains into an insight-driven interactive data analysis workflow. Based on their interpretation of data visualizations, users generate metadata to be fed back into the analysis. The resulting resonance effect improves the performance of subsequent analysis. The paper outlines the ideas behind the workflow, indicates the benefits and discusses how to avoid potential pitfalls.Item Progressive Multidimensional Projections: A Process Model based on Vector Quantization(The Eurographics Association, 2020) Ventocilla, Elio Alejandro; Martins, Rafael M.; Paulovich, Fernando V.; Riveiro, Maria; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoAs large datasets become more common, so becomes the necessity for exploratory approaches that allow iterative, trial-anderror analysis. Without such solutions, hypothesis testing and exploratory data analysis may become cumbersome due to long waiting times for feedback from computationally-intensive algorithms. This work presents a process model for progressive multidimensional projections (P-MDPs) that enables early feedback and user involvement in the process, complementing previous work by providing a lower level of abstraction and describing the specific elements that can be used to provide early system feedback, and those which can be enabled for user interaction. Additionally, we outline a set of design constraints that must be taken into account to ensure the usability of a solution regarding feedback time, visual cluttering, and the interactivity of the view. To address these constraints, we propose the use of incremental vector quantization (iVQ) as a core step within the process. To illustrate the feasibility of the model, and the usefulness of the proposed iVQ-based solution, we present a prototype that demonstrates how the different usability constraints can be accounted for, regardless of the size of a dataset.Item Revealing Multimodality in Ensemble Weather Prediction(The Eurographics Association, 2021) Galmiche, Natacha; Hauser, Helwig; Spengler, Thomas; Spensberger, Clemens; Brun, Morten; Blaser, Nello; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoEnsemble methods are widely used to simulate complex non-linear systems and to estimate forecast uncertainty. However, visualizing and analyzing ensemble data is challenging, in particular when multimodality arises, i.e., distinct likely outcomes. We propose a graph-based approach that explores multimodality in univariate ensemble data from weather prediction. Our solution utilizes clustering and a novel concept of life span associated with each cluster. We applied our method to historical predictions of extreme weather events and illustrate that our method aids the understanding of the respective ensemble forecasts.Item Saliency Clouds: Visual Analysis of Point Cloud-oriented Deep Neural Networks in DeepRL for Particle Physics(The Eurographics Association, 2022) Mulawade, Raju Ningappa; Garth, Christoph; Wiebel, Alexander; Archambault, Daniel; Nabney, Ian; Peltonen, JaakkoWe develop and describe saliency clouds, that is, visualization methods employing explainable AI methods to analyze and interpret deep reinforcement learning (DeepRL) agents working on point cloud-based data. The agent in our application case is tasked to track particles in high energy physics and is still under development. The point clouds contain properties of particle hits on layers of a detector as the input to reconstruct the trajectories of the particles. Through visualization of the influence of different points, their possible connections in an implicit graph, and other features on the decisions of the policy network of the DeepRL agent, we aim to explain the decision making of the agent in tracking particles and thus support its development. In particular, we adapt gradient-based saliency mapping methods to work on these point clouds. We show how the properties of the methods, which were developed for image data, translate to the structurally different point cloud data. Finally, we present visual representations of saliency clouds supporting visual analysis and interpretation of the RL agent's policy network.