Tailorable Sampling for Progressive Visual Analytics

(under review)

Authors

Marius Hogräfer Hans-Jörg Schulz
Screenshot of ProSample.
Enabling tailorable PVA-sampling using a pipeline that structures the sampling process into three steps (linearization, subdivision, and selection). The steps are depicted here along the data format they operate on: The linearization takes in the input data structure and transforms it into linear format, which is then subdivided into bins in the subdivision step. The last step then produces the chunks forwarded into the PVA process by progressively selecting appropriate items from each bin.

Abstract

Progressive visual analytics (PVA) allows analysts to maintain their flow during otherwise long-running computations, by producing early, incomplete results that refine over time, for example, by running the computation over smaller partitions of the data. Generally, these partitions are created using sampling. The goal for sampling in PVA is, therefore, to draw samples of the dataset such that the progressive visualization becomes as useful as possible as soon as possible. What makes the visualization useful depends on the analysis task and, accordingly, some task-specific sampling methods have been proposed for PVA to address this need. However, as analysts see more and more of their data during the progression, the task in PVA often changes, which means that analysts need to restart the computation to switch the sampling method, causing them to lose their analysis flow. This poses a clear limitation to the proposed benefits of PVA. To this end, in this paper we propose a pipeline for PVA-sampling, which allows tailoring the data partitioning to analysis scenarios by changing out modules, and in a way that does not require restarting the analysis. We for the first time characterize the problem of PVA-sampling, formalize the pipeline in terms of the data format, discuss on-the-fly tailoring, and present additional examples demonstrating its usefulness.

Citation in BibTeX

To cite this article, we encourage you to use the following bibtex entry in your citation manager:

@article{pva_tailorable_sampling2022,
  title     = {Tailorable Sampling for Progressive Visual Analytics},
  author    = {Hogr\"afer, Marius and Schulz, Hans-J\"org},
  year      = {2022},
  journal   = {TBA},
  pages     = {TBA},
  publisher = {TBA},
  doi       = {TBA},
}
        

A Pipeline for Tailored Sampling for Progressive Visual Analytics

Proceedings of the 2022 International EuroVis Workshop on Visual Analytics (EuroVA)

Authors

Marius Hogräfer Jakob Burkhardt Hans-Jörg Schulz
Screenshot of ProSample.
Screenshot from ProSample showing a side-by-side view from two pipelines (left and right views) sampling the spotify dataset, shown as binned scatterplots, as well as the delta between these bins (center view).

Abstract

Progressive Visual Analytics enables analysts to interactively work with partial results from long-running computations early on instead of forcing them to wait. For very large datasets, the first step is to divide that input data into smaller chunks using sampling, which are then passed down the progressive analysis pipeline all the way to their progressive visualization in the end. The quality of the partial results produced by the progression heavily depends on the quality of these chunks, that is, chunks need to be representative of the dataset. Whether or not a sampling approach produces representative chunks does however depend on the particular analysis scenario.

Citation in BibTeX

To cite this article, we encourage you to use the following bibtex entry in your citation manager:

@inproceedings{Hograefer2022_sampling,
  title = {A Pipeline for Tailored Sampling for Progressive Visual Analytics},
  author = {Hogr\"afer, Marius and Burkhardt, Jakob and Schulz, Hans-J\"org},
  booktitle = {Proc. of the 13th International {EuroVis} Workshop on Visual Analytics ({EuroVA}'22)},
  pages = {49--53},
  editor = {Bernard, J\"urgen and Angelini, Marco},
  publisher = {Eurographics Association},
  isbn = {978-3-03868-183-0},
  doi = {10.2312/eurova.20221079},
  year = {2022}
}