• Login
    View Item 
    •   Eurographics DL Home
    • Eurographics Workshops and Symposia
    • EGPGV: Eurographics Workshop on Parallel Graphics and Visualization
    • EGPGV18: Eurographics Symposium on Parallel Graphics and Visualization
    • View Item
    •   Eurographics DL Home
    • Eurographics Workshops and Symposia
    • EGPGV: Eurographics Workshop on Parallel Graphics and Visualization
    • EGPGV18: Eurographics Symposium on Parallel Graphics and Visualization
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    La VALSE: Scalable Log Visualization for Fault Characterization in Supercomputers

    Thumbnail
    View/Open
    091-100.pdf (2.767Mb)
    1040-file1.mov (23.93Mb)
    Date
    2018
    Author
    Guo, Hanqi
    Di, Sheng
    Gupta, Rinku
    Peterka, Tom
    Cappello, Franck
    Pay-Per-View via TIB Hannover:

    Try if this item/paper is available.

    Metadata
    Show full item record
    Abstract
    We design and implement La VALSE-a scalable visualization tool to explore tens of millions of records of reliability, availability, and serviceability (RAS) logs-for IBM Blue Gene/Q systems. Our tool is designed to meet various analysis requirements, including tracing causes of failure events and investigating correlations from the redundant and noisy RAS messages. La VALSE consists of multiple linked views to visualize RAS logs; each log message has a time stamp, physical location, network address, and multiple categorical dimensions such as severity and category. The timeline view features the scalable ThemeRiver and arc diagrams that enables interactive exploration of tens of millions of log messages. The spatial view visualizes the occurrences of RAS messages on hundreds of thousands of elements of Mira-compute cards, node boards, midplanes, and racks-with viewdependent level-of-detail rendering. The multidimensional view enables interactive filtering of different categorical dimensions of RAS messages. To achieve interactivity, we develop an efficient and scalable online data cube engine that can query 55 million RAS logs in less than one second. We present several case studies on Mira, a top supercomputer at Argonne National Laboratory. The case studies demonstrate that La VALSE can help users quickly identify the sources of failure events and analyze spatiotemporal correlations of RAS messages in different scales.
    BibTeX
    @inproceedings {10.2312:pgv.20181099,
    booktitle = {Eurographics Symposium on Parallel Graphics and Visualization},
    editor = {Hank Childs and Fernando Cucchietti},
    title = {{La VALSE: Scalable Log Visualization for Fault Characterization in Supercomputers}},
    author = {Guo, Hanqi and Di, Sheng and Gupta, Rinku and Peterka, Tom and Cappello, Franck},
    year = {2018},
    publisher = {The Eurographics Association},
    ISSN = {1727-348X},
    ISBN = {978-3-03868-054-3},
    DOI = {10.2312/pgv.20181099}
    }
    URI
    http://dx.doi.org/10.2312/pgv.20181099
    https://diglib.eg.org:443/handle/10.2312/pgv20181099
    Collections
    • EGPGV18: Eurographics Symposium on Parallel Graphics and Visualization

    Eurographics Association copyright © 2013 - 2023 
    Send Feedback | Contact - Imprint | Data Privacy Policy | Disable Google Analytics
    Theme by @mire NV
    System hosted at  Graz University of Technology.
    TUGFhA
     

     

    Browse

    All of Eurographics DLCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Statistics

    View Usage Statistics

    BibTeX | TOC

    Create BibTeX Create Table of Contents

    Eurographics Association copyright © 2013 - 2023 
    Send Feedback | Contact - Imprint | Data Privacy Policy | Disable Google Analytics
    Theme by @mire NV
    System hosted at  Graz University of Technology.
    TUGFhA