Tailorable Sampling for Progressive Visual Analytics

IEEE Transactions on Visualization and Computer Graphics (2024)

Authors

Screenshot of ProSample. — Enabling tailorable PVA-sampling using a pipeline that structures the sampling process into three steps (linearization, subdivision, and selection). The steps are depicted here along the data format they operate on: The linearization takes in the input data structure and transforms it into linear format, which is then subdivided into bins in the subdivision step. The last step then produces the chunks forwarded into the PVA process by progressively selecting appropriate items from each bin.

Abstract

Progressive visual analytics (PVA) allows analysts to maintain their flow during otherwise long-running computations by producing early, incomplete results that refine over time, for example, by running the computation over smaller partitions of the data. These partitions are created using sampling, whose goal it isto draw samples of the dataset such that the progressive visualization becomes as useful as possible as soon as possible. What makes the visualization useful depends on the analysis task and, accordingly, some task-specific sampling methods have been proposed for PVA to address this need. However, as analysts see more and more of their data during the progression, the analysis task at hand often changes, which means that analysts need to restart the computation to switch the sampling method, causing them to lose their analysis flow. This poses a clear limitation to the proposed benefits of PVA. Hence, we propose a pipeline for PVA-sampling that allows tailoring the data partitioning to analysis scenarios by switching out modules in a way that does not require restarting the analysis. To that end, we characterize the problem of PVA-sampling, formalize the pipeline in terms of data structures, discuss on-the-fly tailoring, and present additional examples demonstrating its usefulness.

Resources

Paper (preprint)

DOI

BibTeX Citation

Notebook for recreating figures

ProSample demo

Code on Github

Citation in BibTeX

To cite this article, we encourage you to use the following bibtex entry in your citation manager:

@article{pva_tailorable_sampling2023,
  title     = {Tailorable Sampling for Progressive Visual Analytics},
  author    = {Hogr\"afer, Marius and Schulz, Hans-J\"org},
  year      = {2023},
  journal   = {IEEE Transactions on Visualization and Computer Graphics},
  volume    = {},
  number    = {},
  pages     = {},
  doi       = {10.1109/TVCG.2023.3278084},
  url       = {https://vis-au.github.io/prosample}
}

A Pipeline for Tailored Sampling for Progressive Visual Analytics

Proceedings of the 2022 International EuroVis Workshop on Visual Analytics (EuroVA)

Authors

Marius Hogräfer Jakob Burkhardt Hans-Jörg Schulz

Abstract

Progressive Visual Analytics enables analysts to interactively work with partial results from long-running computations early on instead of forcing them to wait. For very large datasets, the first step is to divide that input data into smaller chunks using sampling, which are then passed down the progressive analysis pipeline all the way to their progressive visualization in the end. The quality of the partial results produced by the progression heavily depends on the quality of these chunks, that is, chunks need to be representative of the dataset. Whether or not a sampling approach produces representative chunks does however depend on the particular analysis scenario.

Citation in BibTeX

To cite this article, we encourage you to use the following bibtex entry in your citation manager:

@inproceedings{Hograefer2022_sampling,
  title = {A Pipeline for Tailored Sampling for Progressive Visual Analytics},
  author = {Hogr\"afer, Marius and Burkhardt, Jakob and Schulz, Hans-J\"org},
  booktitle = {Proc. of the 13th International {EuroVis} Workshop on Visual Analytics ({EuroVA}'22)},
  pages = {49--53},
  editor = {Bernard, J\"urgen and Angelini, Marco},
  publisher = {Eurographics Association},
  isbn = {978-3-03868-183-0},
  doi = {10.2312/eurova.20221079},
  year = {2022}
}