Machine Learning Methods in Visualisation for Big Data

Permanent URI for this community

https://diglib.eg.org/handle/10.2312/2632402

Browse

Now showing 1 - 20 of 31

Controllably Sparse Perturbations of Robust Classifiers for Explaining Predictions and Probing Learned Concepts
(The Eurographics Association, 2021) Roberts, Jay; Tsiligkaridis, Theodoros; Archambault, Daniel and Nabney, Ian and Peltonen, Jaakko
Explaining the predictions of a deep neural network (DNN) in image classification is an active area of research. Many methods focus on localizing pixels, or groups of pixels, which maximize a relevance metric for the prediction. Others aim at creating local "proxy" explainers which aim to account for an individual prediction of a model. We aim to explore "why" a model made a prediction by perturbing inputs to robust classifiers and interpreting the semantically meaningful results. For such an explanation to be useful for humans it is desirable for it to be sparse; however, generating sparse perturbations can computationally expensive and infeasible on high resolution data. Here we introduce controllably sparse explanations that can be efficiently generated on higher resolution data to provide improved counter-factual explanations. Further we use these controllably sparse explanations to probe what the robust classifier has learned. These explanations could provide insight for model developers as well as assist in detecting dataset bias.
DimVis: Interpreting Visual Clusters in Dimensionality Reduction With Explainable Boosting Machine
(The Eurographics Association, 2024) SALMANIAN, PARISA; Chatzimparmpas, Angelos; Karaca, Ali Can; Martins, Rafael M.; Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko
Dimensionality Reduction (DR) techniques such as t-SNE and UMAP are popular for transforming complex datasets into simpler visual representations. However, while effective in uncovering general dataset patterns, these methods may introduce artifacts and suffer from interpretability issues. This paper presents DimVis, a visualization tool that employs supervised Explainable Boosting Machine (EBM) models (trained on user-selected data of interest) as an interpretation assistant for DR projections. Our tool facilitates high-dimensional data analysis by providing an interpretation of feature relevance in visual clusters through interactive exploration of UMAP projections. Specifically, DimVis uses a contrastive EBM model that is trained in real time to differentiate between the data inside and outside a cluster of interest. Taking advantage of the inherent explainable nature of the EBM, we then use this model to interpret the cluster itself via single and pairwise feature comparisons in a ranking based on the EBM model's feature importance. The applicability and effectiveness of DimVis are demonstrated via a use case and a usage scenario with real-world data. We also discuss the limitations and potential directions for future research.
Exploration of Preference Models using Visual Analytics
(The Eurographics Association, 2024) Buchmüller, Raphael; Zymla, Mark-Matthias; Keim, Daniel; Butt, Miriam; Sevastjanova, Rita; Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko
The identification and integration of diverse viewpoints are key to sound decision-making. This paper introduces a novel Visual Analytics technique aimed at summarizing and comparing perspectives derived from established preference models. We use 2D projection and interactive visualization to explore user models based on subjective preference labels and extracted linguistic features. We then employ a pie-chart-like exploration design to enable the aggregation and simultaneous exploration of diverse preference groupings. The approach allows rotation and slicing interactions of the visual space. We demonstrate the technique's applicability and effectiveness through a use case in exploring the complex landscape of argument preferences. We highlight our designs potential to enhance decision-making processes within diverging preferences through Visual Analytics.
Improving the Sensitivity of Statistical Testing for Clusterability with Mirrored-Density Plots
(The Eurographics Association, 2020) Thrun, Michael C.; Archambault, Daniel and Nabney, Ian and Peltonen, Jaakko
For many applications, it is crucial to decide if a dataset possesses cluster structures. This property is called clusterability and is usually investigated with the usage of statistical testing. Here, it is proposed to extend statistical testing with the Mirrored- Density plot (MDplot). The MDplot allows investigating the distributions of many variables with automatic sampling in case of large datasets. Statistical testing of clusterability is compared with MDplots of the 1st principal component and the distance distribution of data. Contradicting results are evaluated with topographic maps of cluster structures derived from planar projections using the generalized U-Matrix technique. A collection of artificial and natural datasets is used for the comparison. This collection is specially designed to have a variety of clustering problems that any algorithm should be able to handle. The results demonstrate that the MDplot improves statistical testing but, even then, almost touching cluster structures of low intercluster distances without a predominant direction of variance remain challenging.
Interactive Dense Pixel Visualizations for Time Series and Model Attribution Explanations
(The Eurographics Association, 2023) Schlegel, Udo; Keim, Daniel; Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko
The field of Explainable Artificial Intelligence (XAI) for Deep Neural Network models develops significantly, offering numerous techniques to extract explanations from models. However, evaluating explanations is often not trivial, and differences in applied metrics can be subtle, especially with non-intelligible data. Thus, there is a need for visualizations tailored to explore explanations for domains with such data, e.g., time series. We propose DAVOTS, an interactive visual analytics approach to explore raw time series data, activations of neural networks, and attributions in a dense-pixel visualization to gain insights into the data, models' decisions, and explanations. To further support users in exploring large datasets, we apply clustering approaches to the visualized data domains to highlight groups and present ordering strategies for individual and combined data exploration to facilitate finding patterns. We visualize a CNN trained on the FordA dataset to demonstrate the approach.
Interpreting Black-Box Semantic Segmentation Models in Remote Sensing Applications
(The Eurographics Association, 2019) Janik, Adrianna; Sankaran, Kris; Ortiz, Anthony; Archambault, Daniel and Nabney, Ian and Peltonen, Jaakko
In the interpretability literature, attention is focused on understanding black-box classifiers, but many problems ranging from medicine through agriculture and crisis response in humanitarian aid are tackled by semantic segmentation models. The absence of interpretability for these canonical problems in computer vision motivates this study. In this study we present a usercentric approach that blends techniques from interpretability, representation learning, and interactive visualization. It allows to visualize and link latent representation to real data instances as well as qualitatively assess strength of predictions. We have applied our method to a deep learning model for semantic segmentation, U-Net, in a remote sensing application of building detection. This application is of high interest for humanitarian crisis response teams that rely on satellite images analysis. Preliminary results shows utility in understanding semantic segmentation models, demo presenting the idea is available online.
Introducing Fairness in Graph Visualization via Gradient Descent
(The Eurographics Association, 2024) Hong, Seok-Hee; Liotta, Giuseppe; Montecchiani, Fabrizio; Nöllenburg, Martin; Piselli, Tommaso; Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko
Motivated by the need for decision-making systems that avoid bias and discrimination, the concept of fairness recently gained traction in the broad field of artificial intelligence, stimulating new research also within the information visualization community. In this paper, we introduce a notion of fairness in network visualization, specifically for straight-line drawings of graphs, a foundational paradigm in the field. We empirically investigate the following research questions: (i) What is the price of incorporating fairness constraints in straight-line drawings? (ii) How unfair is a straight-line drawing that does not optimize fairness as a primary objective? To tackle these questions, we implement an algorithm based on gradient-descent that can compute straight-line drawings of graphs by optimizing multi-objective functions. We experimentally show that one can significantly increase the fairness of a drawing by paying a relatively small amount in terms of reduced readability.
Machine Learning Methods in Visualisation for Big Data 2018: Frontmatter
(The Eurographics Association, 2018) Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel; Ian Nabney and Jaakko Peltonen and Daniel Archambault
MLVis 2019: Frontmatter
(The Eurographics Association, 2019) Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel and Nabney, Ian and Peltonen, Jaakko
MLVis 2020: Frontmatter
(The Eurographics Association, 2020) Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel and Nabney, Ian and Peltonen, Jaakko
MLVis 2021: Frontmatter
(The Eurographics Association, 2021) Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel and Nabney, Ian and Peltonen, Jaakko
MLVis 2022: Frontmatter
(The Eurographics Association, 2022) Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko
MLVis 2023: Frontmatter
(The Eurographics Association, 2023) Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko
MLVis 2024: Frontmatter
(The Eurographics Association, 2024) Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko
MLVis 2025: Frontmatter
(The Eurographics Association, 2025) Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko
ModelSpeX: Model Specification Using Explainable Artificial Intelligence Methods
(The Eurographics Association, 2020) Schlegel, Udo; Cakmak, Eren; Keim, Daniel A.; Archambault, Daniel and Nabney, Ian and Peltonen, Jaakko
Explainable artificial intelligence (XAI) methods aim to reveal the non-transparent decision-making mechanisms of black-box models. The evaluation of insight generated by such XAI methods remains challenging as the applied techniques depend on many factors (e.g., parameters and human interpretation). We propose ModelSpeX, a visual analytics workflow to interactively extract human-centered rule-sets to generate model specifications from black-box models (e.g., neural networks). The workflow enables to reason about the underlying problem, to extract decision rule sets, and to evaluate the suitability of the model for a particular task. An exemplary usage scenario walks an analyst trough the steps of the workflow to show the applicability.
Neighbour Embeddings: Beyond Visualisation
(The Eurographics Association, 2025) Lambert, Pierre; Couplet, Edouard; Verleysen, Michel; Lee, John Aldo; Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko
Machine learning (ML) has brought powerful tools to the visualisation community, particularly through neighbour embeddings (NE). This family of algorithms enables the intuitive visualisation of high dimensional datasets, by representing these in 2- or 3-dimensional spaces. This paper argues that as NE algorithms have progressed and diversified within the visualisation domain, they have matured into powerful yet often simple methods whose potential remains largely underutilised in broader machine learning contexts. This argument is illustrated by showing through two use cases, clustering and data preprocessing before a supervised task, how NE can contribute meaningfully with little additional algorithmic effort.
On KDE-based Brushing in Scatterplots and how it Compares to CNN-based Brushing
(The Eurographics Association, 2019) Fan, Chaoran; Hauser, Helwig; Archambault, Daniel and Nabney, Ian and Peltonen, Jaakko
In this paper, we investigate to which degree the human should be involved into the model design and how good the empirical model can be with more careful design. To find out, we extended our previously published Mahalanobis brush (the best current empirical model in terms of accuracy for brushing points in a scatterplot) by further incorporating the data distribution information that is captured by the kernel density estimation (KDE). Based on this work, we then include a short discussion between the empirical model, designed in detail by an expert and the deep learning-based model that is learned from user data directly.
Panning for Insight: Amplifying Insight through Tight Integration of Machine Learning, Data Mining, and Visualization
(The Eurographics Association, 2018) Karer, Benjamin; Scheler, Inga; Hagen, Hans; Ian Nabney and Jaakko Peltonen and Daniel Archambault
With the rapid progress made in Data Mining, Visualization, and Machine Learning during the last years, combinations of these methods have gained increasing interest. This paper summarizes ideas behind ongoing work on combining methods of these three domains into an insight-driven interactive data analysis workflow. Based on their interpretation of data visualizations, users generate metadata to be fed back into the analysis. The resulting resonance effect improves the performance of subsequent analysis. The paper outlines the ideas behind the workflow, indicates the benefits and discusses how to avoid potential pitfalls.
Progressive Multidimensional Projections: A Process Model based on Vector Quantization
(The Eurographics Association, 2020) Ventocilla, Elio Alejandro; Martins, Rafael M.; Paulovich, Fernando V.; Riveiro, Maria; Archambault, Daniel and Nabney, Ian and Peltonen, Jaakko
As large datasets become more common, so becomes the necessity for exploratory approaches that allow iterative, trial-anderror analysis. Without such solutions, hypothesis testing and exploratory data analysis may become cumbersome due to long waiting times for feedback from computationally-intensive algorithms. This work presents a process model for progressive multidimensional projections (P-MDPs) that enables early feedback and user involvement in the process, complementing previous work by providing a lower level of abstraction and describing the specific elements that can be used to provide early system feedback, and those which can be enabled for user interaction. Additionally, we outline a set of design constraints that must be taken into account to ensure the usability of a solution regarding feedback time, visual cluttering, and the interactivity of the view. To address these constraints, we propose the use of incremental vector quantization (iVQ) as a core step within the process. To illustrate the feasibility of the model, and the usefulness of the proposed iVQ-based solution, we present a prototype that demonstrates how the different usability constraints can be accounted for, regardless of the size of a dataset.

Browse

Browsing Machine Learning Methods in Visualisation for Big Data by Title

Results Per Page

Sort Options