Machine Learning Methods in Visualisation for Big Data
Permanent URI for this community
Browse
Browsing Machine Learning Methods in Visualisation for Big Data by Issue Date
Now showing 1 - 20 of 28
Results Per Page
Sort Options
Item Machine Learning Methods in Visualisation for Big Data 2018: Frontmatter(The Eurographics Association, 2018) Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel; Ian Nabney and Jaakko Peltonen and Daniel ArchambaultItem Panning for Insight: Amplifying Insight through Tight Integration of Machine Learning, Data Mining, and Visualization(The Eurographics Association, 2018) Karer, Benjamin; Scheler, Inga; Hagen, Hans; Ian Nabney and Jaakko Peltonen and Daniel ArchambaultWith the rapid progress made in Data Mining, Visualization, and Machine Learning during the last years, combinations of these methods have gained increasing interest. This paper summarizes ideas behind ongoing work on combining methods of these three domains into an insight-driven interactive data analysis workflow. Based on their interpretation of data visualizations, users generate metadata to be fed back into the analysis. The resulting resonance effect improves the performance of subsequent analysis. The paper outlines the ideas behind the workflow, indicates the benefits and discusses how to avoid potential pitfalls.Item MLVis 2019: Frontmatter(The Eurographics Association, 2019) Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoItem Interpreting Black-Box Semantic Segmentation Models in Remote Sensing Applications(The Eurographics Association, 2019) Janik, Adrianna; Sankaran, Kris; Ortiz, Anthony; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoIn the interpretability literature, attention is focused on understanding black-box classifiers, but many problems ranging from medicine through agriculture and crisis response in humanitarian aid are tackled by semantic segmentation models. The absence of interpretability for these canonical problems in computer vision motivates this study. In this study we present a usercentric approach that blends techniques from interpretability, representation learning, and interactive visualization. It allows to visualize and link latent representation to real data instances as well as qualitatively assess strength of predictions. We have applied our method to a deep learning model for semantic segmentation, U-Net, in a remote sensing application of building detection. This application is of high interest for humanitarian crisis response teams that rely on satellite images analysis. Preliminary results shows utility in understanding semantic segmentation models, demo presenting the idea is available online.Item On KDE-based Brushing in Scatterplots and how it Compares to CNN-based Brushing(The Eurographics Association, 2019) Fan, Chaoran; Hauser, Helwig; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoIn this paper, we investigate to which degree the human should be involved into the model design and how good the empirical model can be with more careful design. To find out, we extended our previously published Mahalanobis brush (the best current empirical model in terms of accuracy for brushing points in a scatterplot) by further incorporating the data distribution information that is captured by the kernel density estimation (KDE). Based on this work, we then include a short discussion between the empirical model, designed in detail by an expert and the deep learning-based model that is learned from user data directly.Item Visual Analysis of Multivariate Urban Traffic Data Resorting to Local Principal Curves(The Eurographics Association, 2019) Silva, Carla; d'Orey, Pedro; Aguiar, Ana; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoTraffic congestion causes major economic, environmental and social problems in modern cities. We present an interactive visualization tool to assist domain experts on the identification and analysis of traffic patterns at a city scale making use of multivariate empirical urban data and fundamental diagrams. The proposed method combines visualization techniques with an improved local principle curves method to model traffic dynamics and facilitate comparison of traffic patterns - resorting to the fitted curve with a confidence interval - between different road segments and for different external conditions. We demonstrate the proposed technique in an illustrative real-world case study in the city of Porto, Portugal.Item Visual Ensemble Analysis to Study the Influence of Hyper-parameters on Training Deep Neural Networks(The Eurographics Association, 2019) Hamid, Sagad; Derstroff, Adrian; Klemm, Sören; Ngo, Quynh Quang; Jiang, Xiaoyi; Linsen, Lars; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoA good deep neural network design allows for efficient training and high accuracy. The training step requires a suitable choice of several hyper-parameters. Limited knowledge exists on how the hyper-parameters impact the training process, what is the interplay of multiple hyper-parameters, and what is the interrelation of hyper-parameters and network topology. In this paper, we present a structured analysis towards these goals by investigating an ensemble of training runs.We propose a visual ensemble analysis based on hyper-parameter space visualizations, performance visualizations, and visualizing correlations of topological structures. As a proof of concept, we apply our approach to deep convolutional neural networks.Item MLVis 2020: Frontmatter(The Eurographics Association, 2020) Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoItem Improving the Sensitivity of Statistical Testing for Clusterability with Mirrored-Density Plots(The Eurographics Association, 2020) Thrun, Michael C.; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoFor many applications, it is crucial to decide if a dataset possesses cluster structures. This property is called clusterability and is usually investigated with the usage of statistical testing. Here, it is proposed to extend statistical testing with the Mirrored- Density plot (MDplot). The MDplot allows investigating the distributions of many variables with automatic sampling in case of large datasets. Statistical testing of clusterability is compared with MDplots of the 1st principal component and the distance distribution of data. Contradicting results are evaluated with topographic maps of cluster structures derived from planar projections using the generalized U-Matrix technique. A collection of artificial and natural datasets is used for the comparison. This collection is specially designed to have a variety of clustering problems that any algorithm should be able to handle. The results demonstrate that the MDplot improves statistical testing but, even then, almost touching cluster structures of low intercluster distances without a predominant direction of variance remain challenging.Item Progressive Multidimensional Projections: A Process Model based on Vector Quantization(The Eurographics Association, 2020) Ventocilla, Elio Alejandro; Martins, Rafael M.; Paulovich, Fernando V.; Riveiro, Maria; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoAs large datasets become more common, so becomes the necessity for exploratory approaches that allow iterative, trial-anderror analysis. Without such solutions, hypothesis testing and exploratory data analysis may become cumbersome due to long waiting times for feedback from computationally-intensive algorithms. This work presents a process model for progressive multidimensional projections (P-MDPs) that enables early feedback and user involvement in the process, complementing previous work by providing a lower level of abstraction and describing the specific elements that can be used to provide early system feedback, and those which can be enabled for user interaction. Additionally, we outline a set of design constraints that must be taken into account to ensure the usability of a solution regarding feedback time, visual cluttering, and the interactivity of the view. To address these constraints, we propose the use of incremental vector quantization (iVQ) as a core step within the process. To illustrate the feasibility of the model, and the usefulness of the proposed iVQ-based solution, we present a prototype that demonstrates how the different usability constraints can be accounted for, regardless of the size of a dataset.Item Visual Interpretation of DNN-based Acoustic Models using Deep Autoencoders(The Eurographics Association, 2020) Grósz, Tamás; Kurimo, Mikko; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoIn the past few years, Deep Neural Networks (DNN) have become the state-of-the-art solution in several areas, including automatic speech recognition (ASR), unfortunately, they are generally viewed as black boxes. Recently, this started to change as researchers have dedicated much effort into interpreting their behavior. In this work, we concentrate on visual interpretation by depicting the hidden activation vectors of the DNN, and propose the usage of deep Autoencoders (DAE) to transform these hidden representations for inspection. We use multiple metrics to compare our approach with other, widely-used algorithms and the results show that our approach is quite competitive. The main advantage of using Autoencoders over the existing ones is that after the training phase, it applies a fixed transformation that can be used to visualize any hidden activation vector without any further optimization, which is not true for the other methods.Item ModelSpeX: Model Specification Using Explainable Artificial Intelligence Methods(The Eurographics Association, 2020) Schlegel, Udo; Cakmak, Eren; Keim, Daniel A.; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoExplainable artificial intelligence (XAI) methods aim to reveal the non-transparent decision-making mechanisms of black-box models. The evaluation of insight generated by such XAI methods remains challenging as the applied techniques depend on many factors (e.g., parameters and human interpretation). We propose ModelSpeX, a visual analytics workflow to interactively extract human-centered rule-sets to generate model specifications from black-box models (e.g., neural networks). The workflow enables to reason about the underlying problem, to extract decision rule sets, and to evaluate the suitability of the model for a particular task. An exemplary usage scenario walks an analyst trough the steps of the workflow to show the applicability.Item Visual Analysis of the Impact of Neural Network Hyper-Parameters(The Eurographics Association, 2020) Jönsson, Daniel; Eilertsen, Gabriel; Shi, Hezi; Zheng, Jianmin; Ynnerman, Anders; Unger, Jonas; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoWe present an analysis of the impact of hyper-parameters for an ensemble of neural networks using tailored visualization techniques to understand the complicated relationship between hyper-parameters and model performance. The high-dimensional error surface spanned by the wide range of hyper-parameters used to specify and optimize neural networks is difficult to characterize - it is non-convex and discontinuous, and there could be complex local dependencies between hyper-parameters. To explore these dependencies, we make use of a large number of sampled relations between hyper-parameters and end performance, retrieved from thousands of individually trained convolutional neural network classifiers. We use a structured selection of visualization techniques to analyze the impact of different combinations of hyper-parameters. The results reveal how complicated dependencies between hyper-parameters influence the end performance, demonstrating how the complete picture painted by considering a large number of trainings simultaneously can aid in understanding the impact of hyper-parameter combinations.Item Revealing Multimodality in Ensemble Weather Prediction(The Eurographics Association, 2021) Galmiche, Natacha; Hauser, Helwig; Spengler, Thomas; Spensberger, Clemens; Brun, Morten; Blaser, Nello; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoEnsemble methods are widely used to simulate complex non-linear systems and to estimate forecast uncertainty. However, visualizing and analyzing ensemble data is challenging, in particular when multimodality arises, i.e., distinct likely outcomes. We propose a graph-based approach that explores multimodality in univariate ensemble data from weather prediction. Our solution utilizes clustering and a novel concept of life span associated with each cluster. We applied our method to historical predictions of extreme weather events and illustrate that our method aids the understanding of the respective ensemble forecasts.Item MLVis 2021: Frontmatter(The Eurographics Association, 2021) Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoItem Controllably Sparse Perturbations of Robust Classifiers for Explaining Predictions and Probing Learned Concepts(The Eurographics Association, 2021) Roberts, Jay; Tsiligkaridis, Theodoros; Archambault, Daniel and Nabney, Ian and Peltonen, JaakkoExplaining the predictions of a deep neural network (DNN) in image classification is an active area of research. Many methods focus on localizing pixels, or groups of pixels, which maximize a relevance metric for the prediction. Others aim at creating local "proxy" explainers which aim to account for an individual prediction of a model. We aim to explore "why" a model made a prediction by perturbing inputs to robust classifiers and interpreting the semantically meaningful results. For such an explanation to be useful for humans it is desirable for it to be sparse; however, generating sparse perturbations can computationally expensive and infeasible on high resolution data. Here we introduce controllably sparse explanations that can be efficiently generated on higher resolution data to provide improved counter-factual explanations. Further we use these controllably sparse explanations to probe what the robust classifier has learned. These explanations could provide insight for model developers as well as assist in detecting dataset bias.Item Visual Exploration of Neural Network Projection Stability(The Eurographics Association, 2022) Bredius, Carlo; Tian, Zonglin; Telea, Alexandru; Archambault, Daniel; Nabney, Ian; Peltonen, JaakkoWe present a method to visually assess the stability of deep learned projections. For this, we perturb the high-dimensional data by controlled sequences and visualize the resulting changes in the 2D projection. We apply our method to a recent deep learned projection framework on several training configurations (learned projections and real-world datasets). Our method, which is simple to implement, runs at interactive rates, sheds several novel insights on the stability of the explored method.Item ViNNPruner: Visual Interactive Pruning for Deep Learning(The Eurographics Association, 2022) Schlegel, Udo; Schiegg, Samuel; Keim, Daniel A.; Archambault, Daniel; Nabney, Ian; Peltonen, JaakkoNeural networks grow vastly in size to tackle more sophisticated tasks. In many cases, such large networks are not deployable on particular hardware and need to be reduced in size. Pruning techniques help to shrink deep neural networks to smaller sizes by only decreasing their performance as little as possible. However, such pruning algorithms are often hard to understand by applying them and do not include domain knowledge which can potentially be bad for user goals. We propose ViNNPruner, a visual interactive pruning application that implements state-of-the-art pruning algorithms and the option for users to do manual pruning based on their knowledge. We show how the application facilitates gaining insights into automatic pruning algorithms and semi-automatically pruning oversized networks to make them more efficient using interactive visualizations.Item Saliency Clouds: Visual Analysis of Point Cloud-oriented Deep Neural Networks in DeepRL for Particle Physics(The Eurographics Association, 2022) Mulawade, Raju Ningappa; Garth, Christoph; Wiebel, Alexander; Archambault, Daniel; Nabney, Ian; Peltonen, JaakkoWe develop and describe saliency clouds, that is, visualization methods employing explainable AI methods to analyze and interpret deep reinforcement learning (DeepRL) agents working on point cloud-based data. The agent in our application case is tasked to track particles in high energy physics and is still under development. The point clouds contain properties of particle hits on layers of a detector as the input to reconstruct the trajectories of the particles. Through visualization of the influence of different points, their possible connections in an implicit graph, and other features on the decisions of the policy network of the DeepRL agent, we aim to explain the decision making of the agent in tracking particles and thus support its development. In particular, we adapt gradient-based saliency mapping methods to work on these point clouds. We show how the properties of the methods, which were developed for image data, translate to the structurally different point cloud data. Finally, we present visual representations of saliency clouds supporting visual analysis and interpretation of the RL agent's policy network.Item MLVis 2022: Frontmatter(The Eurographics Association, 2022) Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko; Archambault, Daniel; Nabney, Ian; Peltonen, Jaakko