Seeing the trade-offs: Evaluating visualizations for multi-criteria comparison of set alternatives-test

Abstract

Comparisons are central in multi-criteria decision-making, where alternatives must be evaluated across multiple, often conflicting goals. This process becomes especially complex when alternatives are grouped into sets. Based on formative research with domain experts, we first characterize set comparisons in trade-off analysis. In two design workshops, we identify potential visualization solutions to support them. Finally, we conduct a user study with lay participants and collect expert feedback to evaluate the two most promising visualizations that encode trade-offs, alongside a baseline that conveys this information numerically. Our findings show that visual encodings are preferred. The decoupled visual encoding, which separates different trade-off metrics, is favored by the domain expert. For non-experts, this decoupled visualization leads to higher perceived mental load and lower confidence although it may support more varied decision strategies. In contrast, the coupled visualization that integrates these metrics, results in higher confidence but may promote strategies that min-max only the top priorities. Supplementary material are available at https://osf.io/xnbm2/overview?view_only=4b5b940dce0f4ced8b933d9eaceaf3f5.

Keywords

data visualization trade-off analysis comparative visualization set visualization multi-criteria decision-making

Introduction

Comparing alternative options across multiple criteria is a fundamental task in decision-making and information visualization, requiring the assessment of similarities, differences, and the inherent trade-offs each option entails.^1–3 Within multi-criteria decision-making (MCDM), trade-off analysis focuses on how gains in one criterion often come at the expense of others, with the goal of identifying solutions that balance competing objectives when no single alternative is optimal across all dimensions.^4,5 This type of reasoning underpins decisions across diverse domains, for instance, agronomists weigh wheat yield against cost and environmental impact, while car buyers consider performance, fuel efficiency and price.

In many real-world contexts, however, alternatives are not considered in isolation, but are grouped into meaningful sets, such as experimental conditions, product categories, or simulation parameter ranges. Experts in biology, engineering, and other fields commonly cluster candidates into distinct sets based on attribute ranges or categorical variables, to manage variability inherent in complex systems and explore different plausible trade-off solutions.^5,6 This grouping aligns with the properties of many trade-off datasets, where measurement variability renders value ranges more informative than fixed thresholds, and categorical attributes naturally define clusters. For example, when analyzing wheat fertilization strategies or wine fermentation recipes, experts tend to compare groups (or sets) like early fertilization methods or wines with specific aromatic profiles, rather than individual data points. This approach enables a more comprehensive understanding of trade-offs across the solution space.

Comparing sets across multiple criteria is inherently difficult,^7,8 as it requires reasoning about complex, multidimensional relationships where improvement in one dimension may cause deterioration in others. Decision-makers must balance these competing objectives, account for uncertainty and variability, and contextualize results relative to ideal or benchmark solutions.⁹ Designing visualizations that clearly reveal these trade-offs, while supporting exploration and preserving faithfulness to the data, is therefore a key challenge, especially as reasoning shifts from single alternatives to comparing sets.¹⁰ In these cases, trade-off relationships can be non-uniform across data points within each alternative set, and this variability further complicates comparisons to benchmarks.

Traditional MCDM techniques, even for single point alternative comparisons, often rely on numerical aggregation or summary metrics, such as means or weighted sums, to capture trade-offs.¹¹ While computationally efficient, these approaches can obscure the nuanced compromises between criteria, potentially diminishing decision-makers’ confidence. In contrast, visualization offers a means to externally represent trade-offs, enabling users to explore alternative prioritizations and evaluate proximity to benchmark solutions.^12,13 Milutinović et al.¹⁴ found that interactive visualization positively impacts decision-making in geospatial contexts, improving coherency and consistency in trade-offs.

Further advances in set visualization and multi-attribute comparison also underscore the value of visual representations and interactions for analyzing attribute distributions and assessing pairwise similarities using defined measures for sets.^7,8 A wide range of multi-attribute analysis tools support tasks such as identifying groups, extracting subsets, and comparing them across multiple criteria.^15–20 Within this context, Gleicher et al.³ conceptualize comparative visualization designs as being composed of three fundamental building blocks: juxtaposition, superposition, and explicit encoding of relationships. Explicit encoding approaches compute relationships between items and represent them visually (e.g. LineUp¹⁵ and WeightLifter¹²), whereas juxtaposition and superposition techniques rely more heavily on human perception and memory to support comparison (e.g. Domino¹⁶ and UpSet¹⁹).

However, there is still limited understanding of how specific visualization design choices, such as the type of comparative design or the choice of the explicit encoding, shape users’ reasoning strategies in multi-criteria comparison tasks. While a few empirical studies have examined the effects of general visualization design on multi-criteria decision quality,²¹ none, to our knowledge, have investigated specifically how numerically versus visually represented trade-offs, or how coupled versus decoupled representations of trade-off information, influence decision-making outcomes. To fill this gap, we investigate how visualization design affects reasoning, confidence, preference and perceived cognitive load in set-based trade-off analysis.

Our approach computes trade-off metrics, such as means, weighted means, and distances to an ideal reference, while emphasizing the external representation of trade-offs (“seeing the trade-offs”) rather than leaving viewers to cognitively construct these relationships from raw data or item rankings. Building upon established trade-off quantification methods¹¹ and comparative visualization strategies,¹⁰ we extend these frameworks to examine how external representations of trade-offs shape decision strategies in complex, multi-criteria comparisons.

Our work follows a five-stage user-centered design methodology (Figure 1): (1) Formative research involved analyzing expert trade-off analysis videos and interviewing domain specialists to identify how groups are compared, yielding key requirements (R1–R5); (2) Design and ideation workshops produced 51 sketches, whose thematic analysis informed 8 design considerations (DC1–DC8); (3) A tabular prototype incorporated these directions, establishing a focused testbed for multi-criteria set comparison; (4) A controlled study with 18 participants compared two trade-off visualizations (coupled and decoupled) against a baseline that presents trade-off information numerically only, evaluating decision strategies, explanation quality, confidence, preference, and perceived cognitive load; and (5) A feedback session with one domain expert added insights on real-world applicability. This multi-step process grounded our design in expert needs while empirically assessing different trade-off visualization strategies, contributing new insights for both visualization research and MCDM practice.

Figure 1.

Five-stage research methodology with key outputs.

Our main contributions are: (i) High-level requirements for set comparison in trade-off scenarios, combining domain and visualization expert insights; (ii) Empirical evidence suggesting that even basic encodings can shape trade-off reasoning. Numeric tables with simple histograms can offer comparable confidence and more contrastive explanations than visual encodings, though visuals are generally preferred. Coupled visual encodings are advantageous when confidence and possibly min-maxing judgments are prioritized, whereas separable channels (decoupled) may be preferable when detailed analysis and compensatory reasoning are required, especially for expert users; and (iii) Design recommendations for supporting comparison in trade-off analysis tools.

Background and related work

This section situates our work at the intersection of multi-criteria decision-making (MCDM), trade-off analysis, and multi-attribute comparative visualization. We clarify key terms, review strategies for trade-off analysis, and outline multi-criteria comparison tasks and visual strategies, concluding with an overview of approaches to visualize trade-offs of items in multi-attribute visual analytics systems. We then highlight how our contribution extends and complements this body of work.

Key concepts

We define key terms with reference to established literature in MCDM,^22,23 trade-off analysis,^11,24 and multi-attribute and set visualization.^7,8 These working definitions are tailored to our study, particularly for terms like multi-criteria comparison and ideal point.

Trade-off Analysis: A process within MCDM that systematically evaluates compromises between conflicting criteria to identify solutions that balance multiple objectives effectively. Daniel et al.¹¹ emphasize its role as a structured comparison of alternative solutions based on stakeholder criteria and quantitative “figures of merit,” which we refer to as trade-off metrics in this paper, underscoring the importance of externally represented trade-offs.

Multi-Criteria Comparison: Similar to general multi-attribute comparison,^3,7,8 this process involves examining items or sets of items to identify similarities, differences across multiple criteria (data attributes), supporting decision-making and analytical tasks. It also entails placing entities in context to make these differences and similarities salient, as highlighted in information visualization research.¹⁰ In trade-off analysis, multi-criteria comparison can take different forms. Some approaches involve externally represented trade-offs, where relationships between alternatives are computed and presented in numerical and/or visual form, making how gains in one criterion relate to losses in another directly available for interpretation. Quantitative trade-off metrics, such as differences, distances to an ideal reference, or weighted scores, can be used to compute these relationships, and when presented through the interface, either numerically or visually, they constitute externally represented trade-offs. Other approaches rely on cognitively constructed trade-offs, where trade-off information is not directly presented but must be inferred by users through mechanisms such as juxtaposing criteria, reweighting attributes, or exploring rankings, requiring users to mentally integrate the underlying relationships.

In this work, we focus on externally represented trade-offs and examine how different forms of presentation affect decision making, particularly with respect to reasoning, confidence, preference and perceived cognitive load.

Decision Priorities: Importance or weights assigned to each criterion by the decision maker based on their preferences. Oral et al.²² use the more general term “preferences” for the importance assigned to each criterion.

Ideal Point (Reference Point): A benchmark composed of the “best” observed or theoretically possible attribute values across all criteria, used as a reference for evaluating how close items or sets are to an optimal solution or group.^24,25

(Group) Alternatives: The options available to the decision maker for addressing a specific problem or goal.²² These alternatives need to be compared. In the data, alternatives may appear as individual items (points in a multi-dimensional attribute space), or as collections of items (groups or sets) treated as a single decision alternative. In our case, it is groups (or sets) of alternatives.

(Group) Relationships: Associations between items or groups, such as overlaps, hierarchies, or shared attributes. Set visualization often emphasizes relations between elements and a set (such as set membership), or set-to-set relations such as intersections, unions, and subsets.⁷ Our focus in trade-off analysis is on set-to-set relationships of a different type: the relative gains and losses across several decision criteria (dimensions), and the alignment of groups or their individual elements with a reference or ideal point.

Trade-off analysis strategies

Trade-off analysis strategies span from formal quantitative models to participatory and qualitative approaches, supporting decision-making under multiple competing objectives. This includes deliberative and participatory approaches that help surface trade-offs through dialog and collective sense-making, especially when preferences are ambiguous or criteria are difficult to quantify.²⁶ Scenario-based exploration uses what-if narratives to frame decision contexts and expose potential trade-offs providing a foundation for more formalized analyses.²⁷

At the quantitative end, Daniels et al. outline core strategies such as multi-attribute scoring (normalizing and weighting criteria), utility functions for capturing nonlinear preferences, and goal-programming for assessing closeness to target goals, complemented by sensitivity analysis and Pareto frontiers to highlight non-dominated options.¹¹ Building on this, the multi-criteria decision analysis literature offers more nuanced methods such as outranking approaches that handle partial preferences rather than strict rankings,²⁸ and TOPSIS that ranks alternatives by their distance to ideal and nadir solutions.⁹ While these approaches make stakeholders’ preferences explicit, they often present results as aggregated scores or static trade-off sets, leaving interpretation to the analyst.

More recent visual analytics systems integrate these quantification strategies with interactive visual representations. Our work builds on this intersection: rather than introducing new quantification methods, we evaluate visualization designs that surface comparisons between alternative sets and ideal points, aiming to make trade-offs more cognitively accessible and actionable.

Comparative visualization

Building on trade-off analysis methods that quantify and make preferences explicit, comparative visualization focuses on how these quantified alternatives are represented and compared in practice.

Visual strategies to support comparison

Gleicher¹⁰ provides a useful taxonomy for structuring comparative visualization, highlighting three primary strategies: juxtaposition, placing datasets side-by-side for direct comparison; superposition, overlaying them within a shared space for immediate contrast; and explicit encoding, representing relationships or differences as distinct visual elements. Javed and Elmqvist²⁹ further explore this space by investigating composite visualizations, which combine multiple visualization techniques to represent complex multivariate data, offering richer cues to support comparison tasks. Gleicher¹⁰ also discusses designs incorporating a comparative reference, aligning all items relative to a chosen baseline,¹³ sometimes allowing user-selected references³⁰ or even multiple references for richer insights.¹³ These strategies balance the need for comparative clarity with flexibility in defining the “anchor” for reasoning.

Revealing trade-offs in multi-attribute visual analytics

Visual analytics tools have operationalized comparative strategies in different ways, offering users means to compare items or groups and reason about trade-offs. These strategies vary in how trade-off information is made available, ranging from cognitively constructed trade-offs that require users to mentally integrate relationships, to externally represented trade-offs, where gains, losses, or distances are computed and represented in order to support interpretation.

Cognitively constructed approaches often embed trade-off reasoning in composite visualizations or interactive exploration. Podium¹⁷ models subjective user preferences for mixed-initiative ranking, showing differences in how weights get adjusted across different rankings; nonetheless, trade-offs themselves are not directly presented but must be inferred through mental integration of the displayed information. ParetoLens³¹ guides users through multi-objective solution sets, providing interactive views to interpret Pareto frontiers, but without explicit multi-group comparison, nor externally represented trade-offs. Other systems support comparisons at the group or subset level: Domino¹⁶ and UpSet¹⁹ visualize set relationships and intersections, while SmartExplore³² emphasizes aggregated cluster exploration in high-dimensional spaces, and TACO²⁰ tracks changes in tabular data over time. Across these systems, multi-criteria (and sometimes trade-off) reasoning emerges from interaction and juxtaposition, but without external representation of gains or losses.

Approaches that rely on externally represented trade-offs surface interpretable measures of trade-offs, such as gains, losses, or distances to reference points, in the form of numeric or visual presentations. LineUp¹⁵ supports the exploration of multiple ranking configurations based on weighted attribute combinations, shown as numeric values and bars for each item. It also includes integrated tables and slope graphs to make trade-off cues, such as rank gains or losses, more apparent. WeightLifter¹² enables visual exploration of weight spaces, helping users assess how solutions perform across different preference configurations by exposing cost functions and providing a framework to explore how individual solutions rank under varying weightings. Weighted scores are expressed as ranked bars. These tools use either numeric or visual encodings of trade-off metrics, but their focus remains on single solutions within a larger collection of alternatives rather than comparative group-to-group trade-offs.

When it comes to group comparisons, MetricsVis³³ integrates quantitative and qualitative metrics through stacked radar charts, glyphs, and reweighting, enabling users to evaluate individual and group performance while clearly seeing how prioritization influences outcomes. Comparisons are aided by dandelion glyphs presented side by side. SkyLens¹⁸ enables users to identify and analyze non-dominated (Pareto-optimal) solutions through projection, tabular, and comparison views that highlight loss or gain compared to a reference group or item, using bars and colors. These visualizations aim to reduce the cognitive burden of mentally integrating relationships and enhance the interpretability of trade-offs. Both were evaluated through use-cases and expert feedback sessions without comparative evaluations.

More generally, to our knowledge, this body of prior work that externally surfaces trade-offs, both for single items and groups, has not compared possible design alternatives. Thus further user studies are needed to evaluate their effectiveness in reducing cognitive load and supporting decision-making.

Position of this work: Our work addresses decision-making scenarios involving explicit, quantitative criteria, aligning with MCDM literature that emphasizes structured, metric-based evaluation. We focus on quantitative trade-off analysis combined with comparative visualization to enhance understanding and support more informed decisions. Unlike systems designed for dynamic group creation or subset exploration, our approach targets comparisons between predefined sets of alternatives, identified automatically or manually in prior analysis, using numeric or visual encodings to facilitate norm-referenced trade-off comparisons.

Prior work has mostly focused on comparing trade-offs between individual alternatives, rather than between sets of alternatives. When group comparisons are addressed, the utility of such approaches is typically demonstrated through case studies.³³ To our knowledge, no prior user study has formally investigated how visualization designs and trade-off metrics influence the comparison between groups of alternatives. The most closely related study is by Dy et al.,²¹ which examined how different visualization types (e.g. scatterplot matrices, parallel coordinates, radar charts) and data complexity affect decision time and accuracy when users select a single “optimal” alternative. They measured accuracy as consistency with users’ self-reported preferences, following the methodology of Dimara et al.³⁴ Their findings showed that accuracy remained stable across visualization types but improved as the number of options and criteria decreased. In contrast to their study, our work shifts focus from selecting a single item to comparing multiple groups. Furthermore, we adopt a multi-faceted evaluation that includes explanation quality, user confidence, preference and perceived cognitive load. This broader perspective aims to deepen understanding of users’ decision rationales and strategies when faced with competing sets of alternatives.

Formative study

To ground our design in real-world practice, we began by examining how domain experts conduct trade-off analysis and comparisons in their natural settings. This formative study involved revisiting past expert exploration sessions and conducting a new workshop to gain deeper insights into their decision-making processes. Participants’ background are summarized in Table 1 (formative stage).

Table 1.

Participant information: study stage (formative, design, user study), age, gender, degree (obtained), field, and experience (years in domain for formative and design stages; data analysis experience for the user study on a 1–7 Likert scale, 7 = most experienced).

Stage	ID	Age	Gender	Degree	Field	Exp. yrs	Stage	ID	Age	Gender	Degree	Field	Exp. Likert
Formative	1	31	M	MSc	AI/ML	2	User study	14	26	M	MSc	Account.	6
	2	41	M	PhD	AI/ML	15		15	27	M	MSc	AI	7
	3	33	M	PhD	Ecosystem services	4		16	26	M	MSc	CS	7
	4	37	M	PhD	Bioprocess eng.	9		17	21	M	BSc	CS	5
	5	42	M	PhD	Enology	10		18	26	M	MSc	CS	5
								19	27	M	MSc	HCI	6
Design	6	24	F	MSc	HCI	2		20	26	M	MSc	Bio-info.	5
	7	23	M	MSc	HCI	2		21	22	M	MSc	HCI	4
	8	29	M	PhD	HCI	7		22	26	F	MSc	Dentistry	5
	9	27	F	MSc	HCI/Vis	3		23	26	M	MSc	HCI	6
	10	37	F	MSc	HCI/Vis	14		24	26	M	MSc	CS	3
	11	22	M	MSc	Vis	2		25	26	M	MSc	CS	6
	12	29	F	PhD	HCI	4		26	36	M	Eng.	HCI	6
	13	32	M	PhD	HCI	5		27	24	F	MSc	HCI	5
								28	31	M	PhD	HCI	6
								29	26	F	Eng.	Biomech.	5
								30	26	M	MSc	AI	7
								31	22	F	BSc	CS	4

Participant P5 took part in both formative and summative stages.

Videos of expert exploration

We first analyzed $\approx 8$ hours of video recordings from three past trade-off exploration sessions, in which expert pairs used a SPLOM-based visualization tool to work with their own datasets: ecosystem services (9D, 100 points), wine fermentation (14D, 1 K points), and machine learning (10D, 3 K points).

One author conducted a thematic analysis³⁵ of the videos, identifying six explicit comparison instances, while acknowledging many more likely occurred implicitly without verbalization (see supplemental material for details). These comparisons focused on analyzing groups of data points representing alternative solutions to multi-criteria problems. Across all sessions, experts treated such groups as single units (group alternatives). For example, ecosystem services experts identified groups of farms reflecting specific regional farming practices; wine experts referred to clusters of recipes linked to distinct wine fermentation strategies; and ML experts grouped data points by model type.

Although infrequent, these comparisons were crucial, often informing key decisions such as which costly biological experiments to pursue in the lab or field. Comparisons focused on groups of data points, typically few in number (2–6 per comparison). Group size varied by domain: small in wine (4–5 points) and ecosystem services (8–43), but much larger in ML (17–1090), likely reflecting the size of the dataset and how groups were defined within each context.

Overall, experts sought to rank these alternative groups to identify the most promising option to implement or explore further. Experts compared groups based on value ranges, overlaps, and relative positions in the data (e.g. whether a group’s centroid was near a maximum or minimum). They also considered group size and examined internal patterns, such as trends or correlations.

Semi-structured expert interview

To clarify the nuances of these comparisons, we invited back the 2 ML experts (P1,P2) from the previous video analysis as they had the more diverse group sizes, in a 90-min workshop. We welcomed participants and explained the interview’s goal: understanding their comparison needs. To ground the discussion, we presented comparison instances from their exploration video and invited them to elaborate. We ended with a debrief.

The experts reported that they usually compare groups of points such as subsets and clusters, confirming our video analysis. One explained: “we try to compare groups inside the same [benchmark] dataset. So maybe clusters, maybe classes, stuff like that … In the last few works, we were mainly looking at specific trade-offs for the models’ accuracy and complexity” (P2). When asked about their focus during these comparisons, they reported examining the relative positions of the groups, by: “assessing whether the two groups overlap for one [ML] metric or another, for one dimension over another, or how much they overlap” (P1). This is supported by our video observations showing all experts working with ranges and noting their overlaps.

When asked how they compare groups, they described various methods such as identifying outliers, focusing on top-ranked points, examining group ranges, or checking whether values exceed certain thresholds (e.g. minimum, maximum, or ideal values): “In some cases we might be interested in looking at which one is best, relative best. In other cases, we might want to have stuff under a certain threshold” (P2). They also compared group means and variances.

Summary of comparisons by experts

Our video analysis and workshop with ML experts reveal that while the specifics of “what” is being compared and “how” is problem-dependent, recurring elements emerge. Experts often compare groups of points, examining aspects such as the number of points, the ranges of the groups, and the best and worst points in relation to the trade-off criteria. While means and variances of groups are considered in the comparison, experts also examine individual points: they rank points within a group and look at specific points, such as outliers or best/worst values in the group, and they evaluate group points in relation to a reference point. We distil a summary of these needs into the following set of expert requirements for trade-off comparisons between groups:

R0-Groups: compare groups of points;

R1-Detail: see number of points, access to individual points in groups;

R2-Metrics: see best and worst points, means, variance;

R3-Ranges: see group value ranges;

R4-Rank: rank items inside groups, and potentially add or remove them.

R5-Reference: compare to a reference point, such as a threshold, min or max.

Design & ideation workshops

After we confirmed the importance of comparing groups of points (R0), and derived expert requirements for such comparisons, we set out to design appropriate visualizations to support them. We conducted two design workshops where participants generated hand-drawn visualizations for comparing groups of data points, which we then analyzed for inspiration. Example sketches are available in supplemental material.

Participants, scenarios & data

We conducted two 2-h design workshops, with 3 and 5 participants respectively. Participants were researchers or practitioners in HCI and visualization (4.8 mean years of experience, min:2–max:14, see Table 1, Design Stage). The authors of this work also participated in the generation of designs. To provide participants with a realistic design task that did not require domain expertise, we used a hypothetical scenario.

Participants were asked to create designs to compare multiple groups of second-hand vehicles from different manufacturers, to choose one group for a new car-on-demand taxi service. The dataset we used is composed of 60 used cars created by merging two real-life car datasets^36,37 and optimizing four objective dimensions (Miles per Gallon MPG, horsepower, price, and odometer readings) to generate a Pareto Front (PF).³⁸ Sixty cars were randomly selected from this front and grouped into 10 manufacturer-based categories.

Design task and procedure

We began the workshops with an introduction to trade-off analysis to ensure that participants understood the concepts, especially the need to balance competing criteria. We then introduced the design tasks and the criteria involved: to minimize cost and odometer while maximizing MPG and horsepower. Informed by past work on multi-dimensional visualization,³⁴ we provided inspirational materials in the form of different static visualizations for showing the 10 groups. To avoid biasing participants toward any particular design, we included three different data representations: radar graphs, parallel coordinates, and tabular visualizations.

We asked participants to perform two main sketching tasks, all involving comparison of groups. In the first, they had to sketch a visualization to compare the 10 groups, and in the second to consider additional constraints (brands from a specific country).

Analysis and results

The workshops produced 51 sketches (35 were created by participants and the remaining 16 by the authors). The 3 authors jointly performed a thematic analysis³⁵ of the first workshop sketches, identifying stand-alone ideas or systems, based on designers’ groupings and insights from the round-table discussion. From these sketches, we identified themes related to comparison (detailed next). One author then coded the second workshop’s sketches using a mixed method combining closed and open coding. Closed coding followed taxonomies for comparison layouts¹⁰ and graph types,³⁹ while open coding captured the displayed data and its subject, as well as the visual encodings, refined iteratively over three cycles. Coding was conducted in Obsidian note-taking tool.

Our thematic analysis of the 51 sketches yielded 498 coded elements, revealing 6 key themes, organized into 2 broad categories and expressed as design considerations (DC; see Table 2). The first category includes themes that directly align with the five expert requirements (R1–R5) outlined in the previous section: Displayed Data, Subject, and Interactions. The second category, Visual Encoding, Comparison Layout, and Type of Graph, captures design patterns that emerged independently from expert input, shaped instead by participants’ visual thinking.

Table 2.

Summary of frequent design elements and design considerations from stage 2 design workshops, with corresponding expert needs from stage 1.

Stage 2					Stage 1
Theme	#Codes	#Inst.	Most common codes (with counts)	Design considerations	Experts’ needs
Displayed data	26	144	Optimization direction * (23), raw values (19), range (10), ideal point* (10), mean (9), weighted mean (9), other scores (14)	DC2-Metrics,DC3-Context,DC5-Reference	R2–Metrics,R3–Ranges,R5–Reference
Subject	3	88	Aggregated (e.g., w/mean; 31), group-level (31), individual data points (26)	DC1-Detail	R1–Detail
Interactions	11	60	Transform displayed data (e.g., sort, calculate score; 17), re-order criteria* (15), general selection & navigation (28)	DC4-Rank	R4–Rank
Visual encoding	12	103	Color (37), direction arrows (17), lines (11), shapes (10)	DC6-Color	–
Comparison layout	3	54	Side-by-side (32), superimposed (18), explicit relationship (4)	DC7-Juxtaposed	–
Type of graph	9	49	Tabular (16), radio-chart (10), barchart (6), scatterplot (5), parallel coordinates (4)	DC8-Tabular	–
Total	64	498

DC6–8 emerged from the ideation workshops. Implemented prototype design choices are in bold, and those specific to trade-off analysis are marked by (*).

Displayed Data, captured the information content and metrics shown in sketches. Frequently encoded elements included optimization direction, raw values, range information, as well as specific indicators of trade-offs, such as references to ideal solutions, and summary measures such as means and weighted means. These reflect experts’ needs to: (1) communicate contextual information like optimization direction, raw data and value ranges (R3-Ranges); (2) present meaningful summary metrics to simplify comparisons (R2-Metrics), and (3) support evaluation against ideal or reference points (R5-Reference). Together, these insights inform design directions DC3-Context, DC2-Metrics, and DC5-Reference, with DC3-Context extending beyond simple range values to encompass the broader contextual needs of domain experts.

Subject described whether data was shown at the individual data point, group, or in aggregated form (e.g. using weighted/mean, or a scoring method). This addresses the need to maintain flexibility in the visualization granularity, as workshop participants emphasized the importance of switching between summary and fine-grained information (R1-Detail), informing design direction DC1-Detail.

Interactions captured dynamic features such as sorting and reordering of criteria. These mechanisms supported experts’ ranking needs (R4-Rank), both for ordering data points and prioritizing criteria, and motivated design direction DC4-Rank. While many sketches included general selection and navigation interactions, we focused on ranking as it is more central to trade-off exploration.

In contrast, three themes did not directly correspond to expert needs but emerged repeatedly in the sketches:

Visual Encoding included techniques like color gradients (often red-to-green to indicate value desirability), directional arrows, shapes, and lines. These visual strategies, especially color, were central to many sketches and formed the basis for design direction DC6-Color.

Groups Layout described how designers positioned groups relative to each other. The most common approach was side-by-side placement, supporting the derivation of design direction DC7-Juxtaposed.

Type of Graph covered the kinds of visual representations used. While exploratory designs (e.g. 3D and stacked star glyphs, see supplemental material) were proposed, they were less frequently selected. Participants favored tabular views, followed by radio charts, bar charts, and scatterplots. This motivated the inclusion of design direction DC8-Tabular. This choice is consistent with prior work highlighting the effectiveness of tabular visualizations for decision-making tasks and ranking-based exploration.^34,40

Taken together, these implemented design choices (DC1-8) reflect a synthesis of both bottom-up insights from the design workshops and top-down expert needs (R1–R5), grounding the prototype in both empirical and practical relevance.

Prototype for multi-criteria set comparison

Based on workshop outcomes and expert needs, we developed a group trade-off comparison prototype (Figure 2) using HTML, JavaScript and CSS. The goal was not to create a full trade-off analysis system, but a test-bed tool for a controlled study on supporting group comparison. Below, we detail our design choices.

Figure 2.

Prototype for trade-off comparisons between groups (or sets) of points, each represented as a table.

Visualizations and comparison layout

We opted for a tabular visualization as the basis of our prototype, since it was the most popular in our design workshop (DC8-Tabular). It is often enhanced with visual marks inside cells,^41,42 and has been shown to be an effective visualization for decision-making.³⁴ Thus columns represented dimensions and rows data points. We adopted a side-by-side comparison (DC7-Juxtaposed), with one table per group that analysts can move around and align horizontally or vertically. Based on inspiration from the design workshops, we implemented several encodings for cells of the tabular visualization:

Numeric: Shows in each cell the value of a data point for that dimension (e.g. the car’s price, MPG, etc). It is familiar to domain experts who are accustomed to working with tabular data in CSV files without specialized visualization tools.

Colors: Cells can be additionally colored (using hue and opacity), a popular technique in the design workshops (DC6-Color), where participants commonly associated “good” values with green and “bad” values with red. We explain in the next section on “Trade-off Metrics” ways of defining “good” and “bad” values.

Bars: Cells can be additionally filled with a bar. The less filled the cell the “worst” the data value and the more filled the cell the “better” the value. While bars are not as prominent as colors in the generated designs, this visualization type is notably used in past decision-making tools, such as the LineUp.¹⁵

Expressing trade-off metrics

Participants in our expert interview and design workshops expressed trade-off information in various ways. We integrated two key concepts: weight or priority for calculating a weighted mean or score (DC1-Metrics), and distance from an ideal point (DC5-Reference). We now describe the intuition behind these metrics and how they apply to data values, points, and groups (DC2-Detail).

Priorities: Analysts can assign weights to dimensions to reflect their importance. Priority weights help participants focus on key aspects, guiding comparisons and highlighting how changes in priorities affect overall assessments. In our case, participants can assign weights between 1 and 100, representing percentages of importance. Weighted means are a common trade-off metric and is used in tools like LineUp¹⁵ and WeightLifter.¹²

Ideal Values: Analysts can define an ideal value or threshold within the dataset’s range for each dimension. This value also indicates the direction of optimization, such as minimizing price. Ideal values help calculate the “goodness” of a value by measuring its distance from the ideal. For instance, the smaller the difference between a car’s actual price and the ideal value, the better it performs on the Price dimension. Ideal value-driven analysis is a common practice in MCDM for ranking alternatives.^10,13,30

Displayed data & its subject Most of the generated designs provided aggregated views (DC1-Metrics) but for the experts individual points and values are important. We thus used the two trade-off metrics to provide scores for values, for individual data points, and for groups (DC2-Detail).

Metric per point (Score): Reflects how close the values in a row are to their corresponding ideal values, considering the priority of each column. A high weighted mean indicates strong alignment with the analyst’s prioritized dimensions and ideal value:

Score = \frac{\sum_{i} P_{i} \times (1 - \frac{| V_{i} - I_{i} |}{| {globalMax}_{i} - {globalMin}_{i} |})}{\sum_{i} P_{i}}

Where:

$V_{i}$ is the value in the $i$ -th column for the current row.

$I_{i}$ is the ideal value for the $i$ -th column.

$P_{i}$ is the priority of the $i$ -th column.

${globalMax}_{i}$ and ${globalMin}_{i}$ are the global maximum and minimum values for the $i$ -th column across the groups compared.

Metric per group (Mean of scores): To represent each group in an aggregated way, we calculate the mean of the Scores of all points in the group, effectively treating the Score column as any other data column for this calculation.

Metric per cell: Decoupled versus Coupled metrics: In our workshops, sketches typically use distinct visual encodings for visualizing each metric (29 instances), a method we call decoupled representation. For example, weights may be shown as opacity, while distance to ideal values is represented by hue (red for far, green for close). While this separates the two metrics clearly, it requires more visual information than a coupled metric.

Conversely, some tools such as LineUp, encode multiple measures simultaneously, for instance, dimension priority and data point value, within a single visual representation like bar length. This coupling of metrics simplifies the visualization, reducing clutter, but may obscure the relationship between weights and values. In our case, we couple two measures that both indicate trade-offs: dimension priority and ideal value for the dimension.

decoupled metrics per cell: The decoupled metrics we use are the priorities that the participants input for each dimension (1–100) and the distance from the ideal value for each dimension normalized across the range of values in all groups compared.

{distance}_{i} = \frac{{globalMax}_{d} - {globalMin}_{d} - | V_{d_{i}} - I_{d} |}{{globalMax}_{d} - {globalMin}_{d}}

Where: ${globalMax}_{d}$ and ${globalMin}_{d}$ are the global maximum and minimum values for the $d$ -th column across the groups compared. $V_{d_{i}}$ is the value in the $d$ -th column for the $i$ -th point of the group, $I_{d}$ is the ideal value for the $d$ -th column.

coupled metrics per cell: The coupled metric we used is a weighted distance from the ideal value:

{coupled}_{i} = \frac{{priority}_{d} \times {distance}_{i}}{100}

where the distance and priority are the same as in the decoupled case, with a maximum priority value of 100 for each dimension.

In our prototype we allow analysts to choose what metrics to map on the Colors or Bar lengths of our cells. As this design choice is not obvious, in our user study (next section) we consider the impact of coupling/decoupling the priority and ideal value distance metrics.

Bringing it all together: Options & user interactions

To communicate the position of the groups within the bigger context of all available data (DC3-Context), we included at the very top of each table (group) histograms that visualize the distribution of the group’s data for each dimension, within the value range of all available data (i.e. values of all compared groups). The ideal values are also seen in the histograms. The last column is the “Score” column for each data-point, showing the weighted score for that point, calculated based on priorities and ideal values. The last line of the table shows summary information about the group: the mean of each dimension (column) and in the last “Score” column the mean of all data-point scores, which gives an overall assessment of the group’s alignment with the user preferences across all dimensions. These numeric representations of weighted scores (by item, dimension, and group) are always visible.

Analysts can sort data by clicking on the column headers (descending, ascending and back to initial order), as this was deemed important (DC4-Rank). Users can rearrange the column order and this is reflected across all tables/groups. Finally, tables can be moved to align horizontally (aligning points across groups) and vertically (aligning dimensions). Beyond these basic interactions the prototype includes an interaction drawer opened on demand. Here, users can load datasets to compare but also customize the visualization for trade-off analysis (see Figure 2 for full prototype and Figure 3 for interactions). In particular, users can adjust the ideal value either through option sliders or by directly interacting with the histogram, by moving the vertical blue line representing the ideal point.

Priority sliders: Adjust the significance of each column in the analysis (1 –100). Moving a slider changes the dimension’s weight in the score or weighted mean calculations.

Ideal value sliders: Set a target value for each dimension to calculate the deviation of each data point, helping assess which entries are closest to or farthest from the target.

Metric visual mapping: Bars and colors can be set to represent the priority, the distance from the ideal, or a combination of both (coupled representation).

Figure 3.

Interaction sequence using the prototype.

User study

We conducted a user study to understand the impact of our design choices and the use of visualization in comparing trade-offs across groups, focusing on two research questions:

RQ1: Does the visualization of trade-off metrics impact comparison compared to a simple numerical presentation?

RQ2: How do coupled or decoupled visualizations of trade-offs support this comparison?

Participants & data

We recruited 18 participants aged 21–36 (mean 26.1), all with normal or corrected-to-normal vision and no color blindness (Table 1, Stage: User Study). The study lasted about 2 h per participant. Participants used a 27 in screen (3840 × 2160 resolution) split into two windows: the comparison prototype on the right and a form with instructions and answer fields on the left. Sessions were audio recorded and screen captured.

We started with the 4D-PF dataset from the design workshops. To ensure consistent task difficulty, the generated datasets had the same number of dimensions, data points, and weighted mean, with comparable levels of trade-off, measured using the trade-off index from Unal et al.⁴³ The choice of comparable levels of trade-offs resembles real-world problems, where there is no “correct” choice. In total, we created nine datasets: three for training and six for tasks (two per task). Training datasets had three data points (cars), and task datasets had seven. We limited group sizes to seven data points, as pilot tests showed this size was manageable for visual scanning without being overwhelming, especially in the baseline condition. This size also aligned with the groups that some of our experts compared.

Visualization conditions

Guided by our two research questions (RQ1, RQ2), as well as findings from the design workshops and related work, weselected three conditions with externally represented trade-offs for evaluation (Figure 4): a Baseline condition with numerically-presented trade-off metrics; and two visually-encoded conditions–Decoupled, which encodes trade-off metrics using color and opacity, and Coupled, which uses bar length.

Baseline (Numerical): This condition uses numerical data as the baseline, enhanced with weighted scores per data point (last cell in each row), mean values per dimension (bottom row), and data distribution via histograms.

Bars (Coupled): This condition adds bars to the baseline, with bar length representing the weighted distance from the ideal value for each dimension, serving as a single trade-off score.

Colors (Decoupled): This condition adds color to the baseline, with cells filled using a green/white/red color map. Hue represents distance from the ideal, and opacity indicates priority, visually decoupling the two trade-off metrics (e.g. dark green for high priority and low distance, light red for low priority and high distance). It thus communicates two trade-off metrics.

Figure 4.

The three tabular visualization setups in our study: baseline with simple histograms, CoupledBars with added bars, and DecoupledColors with a color gradient, all linked to a table of sliders for adjusting priorities and ideal values.

Participants could not switch visual encodings and metrics during the experiment. But they could manipulate weights (expressing priorities) and ideal values, and see the mean for dimensions and the weighted mean (score) for data points, reflecting the most common choice in our sketches.

Finally, we opted for including histograms and ideal values in all conditions (including the baseline) to ensure a realistic and fair comparison, reflecting established practices (e.g. LineUp, WeightLifter). Even though they can be considered as simple trade-off indicators, excluding them from the baseline would not only have made the comparison unfair, but also weakened ecological validity. In the domain context we worked with (agronomy, biology, AI), such contextual notions are often implicitly present in users’ reasoning (Formative Study, R2-Metrics).

Hypothesis

To address our main research question on how different visualizations impact group comparison in trade-off analysis, we formulated two hypotheses. These are based on the assumption that visual representations will likely communicate more directly the trade-offs, thus reducing mental load and increasing confidence, but it is unclear how they will influence the use of information in the decision rationale. Moreover, as decoupled visualizations surface more information (two measures), we suspect they will lead to an increase in the information used and higher confidence, but will require more effort.

Hypothesis 1: Visual representations of trade-off metrics in group comparison reduce perceived mental load, affect how data is considered in participants’ decision rationale, and increase decision confidence compared to a numerical baseline.

Hypothesis 2: A decoupled visual representation leads to higher perceived mental load, but also higher decision confidence compared to a coupled visual representation, and affects the data considered in participants’ decision rationale.

Tasks

We designed three trade-off comparison tasks (T1–3), each assigned to one of three visualization techniques (BASELINE, DECOUPLEDCOLORS, COUPLEDBARS) counterbalanced across participants. Each task involved evaluating two groups of used cars and selecting the most suitable group for a hypothetical taxi company’s fleet, based on four conflicting criteria: minimizing price and odometer readings while maximizing MPG and horsepower. Priorities and ideal values were provided by the company’s boss. At the end of the analysis, participants were asked to recommend a car group for purchase, justifying their choice in writing by discussing how each group aligned with the boss’s priorities and evaluating their strengths and weaknesses.

Each task used a different pair of datasets, consisting of the four dimensions (odometer, price, MPG, horsepower) and seven points, but with varying priorities. The task simulated expert decision-making (e.g. selecting wine recipes), without requiring domain knowledge. Participants were asked to provide data-driven justifications, helping us capture the evidence they valued in their analysis. The task datasets were designed to ensure non-trivial decisions, without a single correct answer, mirroring real-world trade-off analysis decision-making.

The visualization prototype, described in the previous section, presented data in side-by-side tables. Participants could adjust sliders to explore scenarios, but their final decision had to align with the boss’s initial priorities and ideal values.

Procedure

Each participant completed a 15-min training session to familiarize themselves with the prototype’s functions. After training, they spent 15 min evaluating two groups of data (car bundles) using the visualization prototype, a task similar to those in the design workshops. Throughout the experiment, participants had a reference sheet with key task information, goals, and prototype features (see supplemental material). They were required to justify their choices in writing, detailing each group’s strengths and weaknesses, referencing the data and visualizations. To capture reasoning, participants used a think-aloud protocol, with their comments audio-recorded for clarification. After each task, participants completed the NASA Task Load Questionnaire.⁴⁴

Study measures

We designed four study measures aligned with our hypotheses:

Data Considered and Justification Quality (H1 & H2):

We asked participants to justify their decision choices, and we analyzed the data they report in their explanations. We adapted the Co-12 explanation quality properties from the AI domain⁴⁵ to assess participants’ explanations. In particular, the contrastivity property, captures the extent to which explanations support comparisons between alternatives. As our tasks do not have a single correct or more accurate choice, we view these properties of explanation quality as a possible proxy for the quality of comparisons and reasoning during trade-off analysis, assuming that better explanations reflect the extraction of more numerous and diverse information, leading to more informed decisions.

Confidence (H1 & H2): We evaluated participants’confidence in their conclusions through self-report questions. Gaging participants’ confidence in their choices provides insights into how effectively the visualizations supported their decision-making process.

Perceived Cognitive Load and Preferences (H1 & H2): We incorporated the NASA-TLX Task Load Index⁴⁴ questionnaire after each task to capture participants’ perceived cognitive load and satisfaction with the decision-making process. This measure relates both to H1, which posits that additional visual encodings help compared to the baseline; and to H2, which suggests that while decoupling visual representations may increase perceived cognitive demand, it should lead to better understanding and thus user confidence. Finally, we asked participants to express their overall visualization preference.

Results

We report next on (1) the qualitative analysis of participants’ open ended responses (decision justification) as a means to shed light on the data they considered for their comparisons, the quality of their reasoning as reflected in their explanations, their self-reported confidence of the decisions they made using the different visualizations, and their decision strategies; (2) the perceived load associated with using each of the visualizations; and (3) the reported overall visualization preference.

Statistical analysis for mean comparison was conducted using interval estimation.⁴⁶ Sample means of 95% confidence intervals (CIs) are constructed using BCa bootstrapping (10,000 bootstrap iterations). When interpreting results, a CI of a mean difference that does not overlap with 0 provides evidence of a difference, corresponding to statistically significant results in traditional p-value tests. Here we only report on evidence of a difference (but not the detailed CI values). Analysis scripts, and detailed CIs can be found in supplemental material. For the qualitative analysis, all quotations were translated into English, except those from two participants (P14, P26) who used English throughout the study.

Decision justifications

We analyzed the open-ended responses that participants provided to justify their decisions, in order to characterize their explanations across visualization techniques. Specifically, we identified (i) what information from the visualizations participants refer to (and thus use) when comparing groups of points and justifying their decisions, which highlights what is crucial for visualization designers to provide; and (ii) we used this information to assess differences in how they contrast (i.e. compare) groups using the different visualizations (H1, H2). In addition, we report (iii) participants’decision strategies when choosing between the two data groups, to shed light into how group comparisons are made.

We used a reflexive thematic analysis (TA) approach³⁵ to analyze the qualitative data in three steps. First, one author assigned labels to text segments related to group comparisons, initially using a closed list of codes from related taxonomies,^2,3 followed by open coding. In the second pass, coders categorized snippets as referring to information or decision strategies and grouped codes into higher-level categories (Information Considered and Decision Strategy). Finally, all authors examined how the codes relate to quality of explanation metrics (Justification Quality). Details of the coding are available in the supplemental material.

Information considered in comparisons

We coded types of data-related information participants reported analyzing during comparisons or using in their justifications (248 instances in total: BASELINE: 65, DECOUPLEDCOLORS: 82, COUPLEDBARS: 101). See Figure 5 for all instances by condition. A more detailed analysis of code frequency is provided in supplemental material.

Figure 5.

Number of times a type of data-related information was considered (C1–C7) across all participants.

We report next how many participants used specific types of information at least once to justify their choices. Most participants justified their choices by reference to group values and distribution characteristics, or to individual data points and their values.

For BASELINE, most participants used group-level analysis at least once, making references to C3: AverageValues (14/18 participants) and C2: DataDistribution&Histogram (13/18 ) analysis, with some referencing analysis on specific data points like C1: IndividualPoints (11/18 ). There were also references to how closely the data aligns with the priorities of the task, that is, how close or far they are from or to C6: IdealValues (7/18 ).

Similarly, for COUPLEDBARS, most participants used group-level C2: DataDistribution&Histogram analysis (11/18 ) and C3: AverageValues (12/18 ) at least once, with fewer referencing C1: IndividualPoints (8/18 ) and C6: IdealValues (2/18 ). References to visualization elements were rarer: they were explicitly mentioned as justifications by few participants for C4: BarsAnalysis (5/18 ).

For DECOUPLEDCOLORS, most participants also referred to C2: DataDistribution&Histograms at least once (14/18 ), followed by C1: IndividualPoints (12/18 ), C3: AverageValues (10/18 ), and C6: IdealValues (7/18 ). Again, there were rare references to visualization elements C5: ColorAnalysis (2/18 participants).

References to C7: BadValues were pinpointed by few participants in all conditions (2–4/18 ).

Justification quality

We adapted five Co-12 explanation quality properties,⁴⁵ originally developed for AI explanations, to evaluate participants’ trade-off justifications: Correctness (factual errors in reasoning), Completeness (variety of types of information used), Compactness (amount of information used), and Contrastivity and Confidence discussed next. With the exception of Confidence, these properties were calculated in a second and third iteration of the codes in our thematic analysis. We report only on Contrastivity and Confidence, as they yielded significant results. Details of how the five properties were coded, as well as detailed results about the other properties, are in supplemental material.

Contrastivity: We counted the number of argument pairs favoring each group that participants used in their justifications, assuming such pairs reflect a consideration for arguments for each group and thus a contrast between the two groups. Arguments against a group were inverted to simplify counting (e.g. an argument against Group B counted as one for Group A). Contrastivity averaged 1.94 pairs in BASELINE (range: 0–5), 1.44 in COUPLEDBARS (range: 0–4), and 1.33 in DECOUPLEDCOLORS (range: 0–5). There is strong evidence that BASELINE had indeed the highest average contrastivity than the other two, indicating that participants were more likely to present arguments for both groups.

Confidence: Participants rated their confidence on a seven-point Likert scale. Confidence averaged 5.6 in BASELINE (range: 4–7), 5.94 in COUPLEDBARS (range: 5–7), and 5.17 in DECOUPLEDCOLORS (range: 2–7). We have strong evidence that participants were less confident with DECOUPLEDCOLORS than COUPLEDBARS and no evidence of difference between the rest.

Overall, explanation quality metrics revealed few differences across visualizations. BASELINE showed higher contrastivity than the others. Contrary to H2, participants reported less confidence with DECOUPLEDCOLORS than COUPLEDBARS, while no clear differences emerged between the remaining conditions.

Decision strategy

We additionally coded the participants’ decision-making strategies in the group comparison task (Figure 6). Most used one strategy, though a few considered two.

Figure 6.

Decision strategy across visualizations.

(S1) Strong maximization of the priorities: Choosing a group based on its optimal performance in one or two dimensions, particularly those with the highest priorities. For example, P24 mentioned during the COUPLEDBARS condition: “I make my comparison taking into account only the mpg and horse power (main priorities).” Many participants explained that their strategy was to ignore objectives with lower priorities, and thus excluding them from the comparison. P20 further explains why this exclusion strategy works well for BASELINE: “Much more difficult to make a decision here, having to go through all the data in text.” Beyond excluding low priority objectives, participant P30 adds: “The choice in BASELINE was made more on the whole using averages,” highlighting a tendency to simplify the decision-making process due to the limitations of the visualization. In BASELINE, the maximization strategy was used by 13/18 participants ( and in COUPLEDBARS by 11/18 participants (). But was less common in DECOUPLEDCOLORS, used only by 7/18 participants (). We note that one participant in BASELINE and two in COUPLEDBARS considered a balanced approach (see next) but ultimately made a decision that strongly maximized 1–2 top priorities.

(S2) Balanced approach: Deciding on a group that is not the best for the highest priorities but a very good choice for several lower priority dimensions and not a bad choice for the strongest priorities. In other words, this is a strategy that considers trade-offs across many objectives. For example, P23 mentioned while in DECOUPLEDCOLORS: “Even if group B has a slightly better mpg than group A (based on the histogram), it doesn’t seem significant and we look at the other priorities, which are all three low indeed, but the odometer comes first: it’s better in group A. Then we look at the price, better in group A. Then we look at the horsepower, bad in both. Group A seems preferable.” The balanced approach was most common in DECOUPLEDCOLORS, used by 10/18 participants (), followed by COUPLEDBARS with 6/18 participants () and BASELINE with 4/18 participants ().

(S3) Decision by individual points or outliers: Justifying a decision based on the specific details and anomalies within the data points rather than the overall trends or averages as a main justification. This was a rare strategy, used by three participants in total, 1/18 in each of the three visualizations (). For example, P30 declared in DECOUPLEDCOLORS: “The taxi that penalizes the mpg in group A already has a high odometer and may need to be replaced, which will improve the average mpg. In group B, replacing taxis with a high odometer will penalize the average mpg making it less attractive. In the long term, group A will, therefore, be better on all criteria.”

Overall, there is strong statistical evidence that most participants using BASELINE tend to adopt strategies that optimize one or two dimensions with the highest priority in the task. To a lesser extend, this is true when they use COUPLEDBARS (more than half of participants, weak evidence). On the other hand, in DECOUPLEDCOLORS their strategies are more mixed, with more than half adopting a balanced approach.

Perceived load - NASA TLX

We report only significant NASA TLX results on task demand across the three visualizations, using 21-point Likert scales (see Figure 7 and supplemental material for additional details.

Figure 7.

Mean values for the different categories of the NASA TLX questionnaires.

Figure 8.

Participants’ ranking of the visualizations.

Mental demand

COUPLEDBARS has the lowest mean (10.72), and there is strong evidence that it was perceived as less mentally demanding compared to both BASELINE (13.5) and DECOUPLEDCOLORS (13.28);

Performance

There is possibly a trend for mean performance in COUPLEDBARS (14.39) to be higher than BASELINE (13.06) but no other evidence of a difference. The mean for DECOUPLEDCOLORS (13.44) was between the two.

Effort

The COUPLEDBARS has the lowest mean (9), and there is strong evidence that participants felt it required less effort compared to both BASELINE (11.72) and DECOUPLEDCOLORS (11.83).

Frustration

COUPLEDBARS shows the lowest mean (4.17), and there is strong evidence that it caused less frustration compared to both DECOUPLEDCOLORS (8.78) and BASELINE (8.94).

Overall, COUPLEDBARS showed lower perceived mental demand, effort & frustration, while BASELINE & DECOUPLEDCOLORS had similar means. These results partially validate H1: COUPLEDBARS reduced perceived mental load versus BASELINE but DECOUPLEDCOLORS did not. They support H2 as DECOUPLEDCOLORS was perceived as more mentally demanding than COUPLEDBARS .

Preferences and qualitative feedback

After the tasks, we asked the participants to rank the three visualizations based on preference and explain their ranking (see Figure 8). COUPLEDBARS was the most preferred visualization, ranked first by 10 participants (). DECOUPLEDCOLORS came a close second, with eight participants ranking it as their most preferred choice (). BASELINE was by far the least preferred visualization, with 0 participants ranking it as their most preferred choice ().

These results show that participants clearly prefer the visual representations of trade-off metrics compared to the BASELINE that only includes numerical data (and histograms). As P20 mentioned for BASELINE: “Much more difficult to make a decision here [Baseline], having to go through all the data in text.” And P30 adds that: “The choice in Baseline was made more on the whole using averages,” highlighting a tendency to simplify the decision-making process due to the limitations of the visualization.

Slightly more participants preferred the coupled COUPLEDBARS (10/18 ) over the decoupled DECOUPLEDCOLORS. The participants who preferred COUPLEDBARS generally found them easier to read. As P24 explained: “The color representation greatly overloaded the reading of the data. The fact of having two distinct variables (color and opacity) to represent the priority and the ideal value did not facilitate handling. The representation of the relationship between priority and ideal value in the form of a single variable (the gray bar) was easy to use for me.”

Nevertheless, there were several participants who preferred DECOUPLEDCOLORS (8/18 (). They felt it made outliers and discrepancies with their goal stand out. For example, P23 mentioned “The DecoupledColors is ideal for me with the colors that greatly facilitate comparison and allow you to see the big differences directly…” Another (P14) commented on the intuitiveness of the representation: “I found the color and opacity combination very helpful in the analysis –the green to red scale is very intuitive, as is the scale from very solid color to nearly transparent. That aligned well with how I was already interpreting the decisions.” Some of these participants commented on how (compared to colors) with COUPLEDBARS they had “difficulty assessing what the maximum of the bars corresponds to” (P22).

Expert feedback

To illustrate how the prototype works, particularly its integration of trade-off metrics and visualizations for group comparison, we conducted a feedback session with an enology expert (Table 1, P5). The expert was tasked with comparing two groups of wine recipes, choosing one to implement in a lab experiment, and justifying their choice. The wine recipe groups, identified in a previous analysis, reflect two distinct temperature management strategies. Each group includes 22 recipes, characterized by eight criteria: three desirable aromas (A1-3), two undesirable compounds (C1-2), energy consumption (E), maximum cooling power (P), and fermentation time (T).

The feedback session was conducted remotely via Zoom, and lasted $\approx 80$ minutes. It followed the same procedure as the user study, but we trimmed the training as our expert was familiar with their data and decision task. In addition, to observe how they would appropriate the tool, we allowed them to freely switch between the three visual representations. We transcribed the audio recording and two authors conducted a thematic analysis³⁵ of the transcript, identifying the trade-off metrics the expert used, the decision strategies they adopted, and their preferences across the different conditions for solving the comparison task.

Our expert started their exploration by immediately setting priorities, giving maximum weights to two aromas (A2-3), energy (E) and power (P). They explained that the other aroma and the compounds were correlated and weighted them mid-range. They left the weight for time (T) low as they were not as much concerned about fermentation duration. They then set their ideal values, noting that for some aromas (A1) and for (T) they are looking for mid-range values, while attempting to maximize (A2-3) and minimize (C1-2,E,P). Figure 9 shows the participant’s preferred condition (DECOUPLEDCOLORS), and the final priorities and ideal values used to make their choice.

Figure 9.

Screenshot of the wine expert’s preferred setup (DECOUPLEDCOLORS) showing participant’s priorities and ideal values.

Their reaction to BASELINE was underwhelming and in the training they considered only the weighted score per group and stated, “I have trouble choosing [ …] they are very similar. I don’t know, I would say group A, because the score seems better.” When their own dataset was loaded, they decided immediately to move to one of the two visual encodings. They started with COUPLEDBARS and used this view to set up their preferences and priorities in the sliders, but once those were set, they quickly moved to colors, “Does it [COUPLEDBARS] make sense to me ? […] I’ll activate the colors to see what it looks like.” Their reaction to DECOUPLEDCOLORS was enthusiastic, “Colors, I like them […], I find it immediately easier to grasp in terms of visualization. Here, spontaneously, I see that my wine B meets its criteria on the alcohol criteria [C1-2], but on the aroma criteria [A1-3] that interest me, it’s less good […] I see that there are still quite a few green things [highlighting several dimensions]. The fermentation time [T] doesn’t necessarily have huge importance. It’s in [low] opacity.” As they explained, “With the color, it helps me make a choice. It’s obvious. It’s much easier than the other ones.” They added, “I find it [BASELINE] less readable. It’s a bit harder to grasp. Because there are only the scores […] Later [after using DECOUPLEDCOLORS], it [COUPLEDBARS] was clearer. But maybe that’s because I don’t have enough experience with the tool. But it’s true that with the color and opacity, it’s nice […] We are trained in the lab to read heatmaps, for example. It [ DECOUPLEDCOLORS] looks like a heatmap.” When asked to justify their final decision, they again made references to the colors to indicate trade-offs in the different criteria. They stated high confidence about their choice, “The approach I set in the priorities, when I started adjusting my sliders, I can see it reflected in the colors, in the table, and in the underlying results [data].” When we indicated that their choice (group A) actually had an overall weighted score that was worse than Group B, they stated “I did it by color. Really, I didn’t look at the scores.”

With regards to adjusting ideal values, the expert expressed keen interest, “There’s something I didn’t do, that I would have enjoyed. But that’s once I initially made my recipe by setting my sliders. I would have liked to play afterward and say: I’m a bit less ambitious.” Finally, when asked whether they adopted a maximization strategy, a more balanced approach, one driven by individual points, or another strategy, they replied that it was “rather a maximization strategy,” albeit with a greater number of key priorities (four criteria all set to 100, Figure 9).

Discussion, future work & limitations

Our work extends prior research on comparative visualization and decision making, evaluating the comparisons of groups of points for trade-off analysis. It (i) underscores the need to add trade-off contextual facets in the comparison process, represented by the user’s goals and expressed as criteria, priorities and the direction of their optimization; and (ii) highlights how concrete representations of these facets contribute to the comparison of and reasoning with groups of points over a multidimensional space.

Understanding what elements constitute the contextual facets that support group comparison is a contribution of our work. In our video analysis and workshop with experts, we were first able to identify, captured as key requirements, high-level trade-off context in the form of: metrics (priorities, ideal values), access to different levels of data detail (raw datapoints, weighted scores of datapoints/dimensions/groups), use of data ranges and ranking. We explored how to visualize these contextual facets in two design workshops. Our subsequent user study examined how concrete representations of these elements are used in trade-off analysis, discussed next. The feedback session with the domain expert further illustrated how these approaches apply to real-world tasks, revealing a clear preference for the decoupled, visual representation, contrary to our findings with lay participants, and showing that it helped them quickly grasp trade-offs and make confident decisions based on visual patterns rather than exact scores.

We revisit next our research questions, highlighting key takeaways (in italics), and associated research implications and future work. We finish with a set of limitations.

Should we visualize trade-off metrics (RQ1)?

Based on our design workshops and related work, we hypothesized (H1) that trade-off visualizations would aid comparisons in terms of perceived effort, increase confidence in final decisions, and influence how data is contrasted. This hypothesis was only partially confirmed. We chose as our baseline a tabular visualization that already provides elements to help comparisons, such as means, weighted scores, and dimension histograms. Regarding the quality of justifications, the baseline produced more contrastivity than other visualizations, in other words participants tended to discuss pros/cons of their decisions more. In terms of mental effort, it was considered worse than only one of the visual representations (bars). Finally, participants’ confidence in their decisions was not lower than the trade-off specific visualizations. This suggests that simple tabular visualizations are still an effective representation for trade-off comparison of groups as long as they provide context (histograms) and metrics (weighted scores and means).

Surprisingly, baseline tables provided high contrastivity and confidence, suggesting that when users are comfortable with raw data, simple tables can rival complex visualizations, though scalability and further testing remain important. We note, however, that all study participants, and the domain expert, preferred one of the two visual trade-off conditions, even though the decoupled visualization seemed complex for some. We should thus consider designing for a continuum in the external representation of trade-offs, ranging from purely numerical representations to visually enriched ones that support different levels of perceptual comparison and cognitive effort.

Should we use a coupled or a decoupled visualization of trade-off metrics (RQ2)?

We hypothesized (H2) that decoupled visualizations (color hue and opacity in our case) would provide access to more information and lead to higher confidence and influence justification quality, but would also impose higher perceived mental load than coupled visualizations (bars in our case). One part of our hypothesis was supported by findings in the user study: decoupled colors were indeed perceived as more mentally demanding than bars. Nevertheless, other aspects were unsupported. We found no evidence of a difference in quality of justifications, and most surprisingly, confidence in the decisions was in fact lower with decoupled colors. One thing, however, is clear: visualizations were strongly preferred in our user study over the numerical option. Among the visual representations, participants were divided between bars and colors. Whereas our domain expert clearly favored decoupled colors for readability and found it more intuitive. We thus suggest the design of dynamic visualizations that adapt trade-off encoding choice and complexity to user behavior and expertise, and can thus support diverse decision-making styles. Supporting such flexibility, combined with a visibility of the trade-off criteria and their mappings aligns with recommendations from the literature.²²

The increased perceived cognitive load in decoupled colors was expected, consistent with the split attention effect,⁴⁷ but the reduced confidence was not. Some participants likely noticed subtler trade-offs, realizing no option was clearly “best,” which reduced confidence. Individual differences in sensitivity to cognitive load may also have contributed as well, reflected in the varied scores, and underscore the need for further studies on visual MCDM tasks.

Understanding, supporting and shaping decision strategies

The justifications for our participants’ final decision show differences in the types of strategies the visualizations support. We see that in the baseline tabular visualization, and to a lesser extent the coupled bars, most participants adopt decision strategies that optimize one or two criteria only, what we refer to as strong maximization of priorities. This aligns with decision-making strategies relying on non-compensatory heuristics such as lexicographic rules⁴⁸ and the take-the-best heuristic.⁴⁹ Whereas in the decoupled colors, we observe more varied strategies from the same participants, with some balancing optimizations across all criteria, consistent with compensatory decision-making models.²³ Our results suggest that baseline visualizations, and possibly coupled bars, support strategies that adopt strong maximization of few (highest) priorities; if balanced decision strategies that consider multiple priorities are important, then decoupled color representation could be more appropriate.

An important question that emerges from these results is whether specific visualizations support the strategies participants want to adopt, or if they actively influence decision strategies. Our results hint that we could use simple encodings to influence trade-off decision strategies. Coupled encodings (e.g. simple weighted means or bars) tend to promote prioritization of top-weighted criteria, while decoupled encodings (e.g. hue and opacity) could encourage more balanced, multi-criteria reasoning. As visualization designers, we could choose based on desired decision behavior, but this raises ethical considerations regarding “nudging” users and biasing their decision strategies.

Our findings support prior practices in multi-attribute visual analytics, such as LineUp¹⁵ or WeightLifter,¹² confirming that external trade-off representations that are visual are preferred by users. However, our comparative evaluation is more nuanced, indicating that the form of representation matters. Our findings reveal a possible fundamental tension in MCDM visualizations: representations that enhance contrastive (baseline) or confident decisions (coupled bars) do not necessarily support transparency and balanced decision strategies (e.g. decoupled colors). Visual encoding may also shape comparison strategies via ensemble perception, where summary features of groups, rather than individual values, guide attention.⁵⁰ Future work should investigate how visual complexity, perceptual and cognitive load, ensemble effects, and decision effort each influence different stages of decision making. In contrast, cognitively constructed approaches,^16,19,31 avoid the possible trade-off between transparency and visual complexity altogether, by leaving the integration to the user. But given that the comparison is performed across groups, this integration is likely more challenging than for individual points, leading to a different type of mental (integration) effort - the impact of this integration and how it compares to external representations needs to be confirmed.

Integrating comparisons in VA workflows

We must consider how to integrate comparison into broader MCMD workflows. We studied contexts where criteria are defined a-priori, both when it comes to our experts and in our user-study. There are, nonetheless, cases where criteria may be defined progressively, or a-posteriori¹² as users start exploring their data. Our work on comparison becomes relevant when loosely defined criteria solidify into clear objectives and well-formed groups or subsets to compare.

To arrive at this stage, analysts may need to first go through VA analysis using exploration and set-theoretic relationships such as intersections, unions, and subsets (like LineUp, Taco, Domino, Upset, and SmartExplore, etc.). Our approach can serve as a dedicated comparison aid embedded within such broader decision-support systems, activated when analysts want to zoom into the similarities and differences between alternative candidate sets.

For practical reasons, we tested a small number of groups, each containing a small number of items (<10), reflecting the datasets sizes encountered in some of our expert interactions. In real-world decision-making, users often face more sets, items, and dimensions. It is unclear how increasing the number of sets affects the ability to maintain holistic, priority-driven judgments; how larger numbers of items influence ensemble perception and the detection of anomalies; and how higher-dimensional tables affect the balance between ensemble perception and transparency of individual components. For larger datasets or groups, participants may adopt comparison strategies that rely more on averages or aggregation. Understanding these limits will be critical for designing scalable table visualizations that support more effective trade-off analysis.

Based on our workshops, we focused on variations of trade-off metrics with respect to criteria and comparisons relative to a best or ideal point. In addition, we need to investigate if results could differ when considering other metrics, such as comparisons relative to a nadir or multiple reference points as suggested in the literature.¹³ Our study highlights the importance of normative baselines (e.g. the ideal point) as anchors for group comparison. However, because participants often negotiated the norm by adjusting priorities and ideal values, such anchors should be user-controlled. Exploring how to visualize multiple anchors and the impact of combining multiple interactive anchors on the trade-off analysis process remains future work.

Limitations

Our experts work with complex datasets derived froiological models, but simpler trade-off tasks with fewer criteria or data points also exist. In our design workshops, we carefully constructed datasets and tasks to reflect expert practices, though these choices and the inspirational material may have shaped our prototype designs. Still, the alignment between design requirements and expert feedback gives us confidence in identifying what visualizations must provide.

Our resulting prototype, informed by workshops with visualization and HCI experts, takes a tabular form. In contrast to state-of-the-art multi-dimensional visualizations, this is a simple representation. Nevertheless, we note that tables have been found in prior work to be particularly effective in decision-making³⁴ and preferred by many data experts.⁵¹

Our visualization prototype supports various combinations of trade-off metrics and visual encodings. To keep the study manageable ≈2h), we tested three variations: one based on frequent designs from our workshop (decoupled colors), one from the state of the art (coupled bars), and one that we consider a usable baseline. This resulted in a limitation of our user study: we compared a mix of metric coupling and visualizations (bars with coupled metrics and colors with decoupled). We are thus not able to study in isolation the effect of coupling. It is possible some of our results, for example, higher perceived mental load in decoupled colors, are more due to the visual encoding than the metric coupling. Clarifying this question remains future work, and would benefit from findings in vision sciences on the coupling of or interference between different visual channels.

We considered several properties for justification quality. While these highlight differences in participants’ reasoning (in contrastivity and confidence), as our decision tasks do not have a ground truth or an objectively “correct” choice, it is unclear if they are a reliable proxy for decision-making quality.

Finally, in the summative feedback session we only had one domain expert who had also participated in the formative stage, which may have been influenced by their prior involvement, potentially introducing positive bias. This feedback aimed to highlight domain-specific value rather than to generalize results and complements the study with 18 non-expert participants.

Conclusion

Comparison is an important task in multi-criteria decision-making. Although prior work has considered the comparison of individual points, studying how to compare groups of points remains an open challenge. We contribute empirical results on the evaluation of group comparisons for trade-off analysis. To better identify user needs for comparisons of groups, we conducted a workshop with experts that provided valuable insights into how trade-off comparisons are conducted, and what support visualizations need to provide. Two design workshops in-turn inspired the implementation of a prototype to support group comparisons. As we had to consider several design decisions about how to present trade-off metrics, we evaluated two trade-off metric visualizations against a baseline that uses a purely numerical approach.

Our results identify factors involved in such comparisons and emphasize the need to incorporate trade-off contextual facets beyond raw data, value ranges, and optimization direction, including broader user needs and user goals expressed as criteria, priorities and reference values. They highlight that visual representations of these trade-off facets are preferred, but that decoupling their visual representation increases perceived mental load for non-domain experts. These findings, along with expert feedback that on the contrary favored the decoupled representation, raise questions about how visualizations can support, or even promote, different trade-off decision strategies, and how preferences may vary with domain expertise, they highlight the need for comparative studies that move beyond individual components to consider groups of items. Such studies can clarify how visualization design shapes not just data understanding but also reasoning, decision strategy and confidence.

Footnotes

Acknowledgements

We thank our domain experts, as well as all workshop and user study participants, for their valuable contributions to this work. We also thank the anonymous reviewers for their insightful feedback, which greatly improved this work.

ORCID iDs

Mehdi Chakhchoukh

Anastasia Bezerianos

Nadia Boukhelifa

Ethical considerations

Approved by the Ethics Committee of Université Paris-Saclay.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Ecoles doctorales (Universit/è Paris-Saclay); Agence Nationale de la Recherche (France) ANR projects: ANR-24-CE10-4479 and ANR-24-CE33-4303.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

All supplementary material are available at: .

Supplemental material

Supplemental material for this article is available at .

References

Brehmer

Munzner

. A multi-level typology of abstract visualization tasks. IEEE Trans Vis Comput Graph 2013; 19(12): 2376–2385.

Amar

Eagan

Stasko

. Low-level components of analytic activity in information visualization. In: Proceedings of the IEEE symposium on information visualization, 2005, INFOVIS 2005, pp. 111–117. IEEE.

Gleicher

Albers

Walker

, et al. Visual comparison for information visualization. Inf Vis 2011; 10: 289–309.

Hakanen

Miettinen

Matković

. Task-based visual analytics for interactive multiobjective optimization. J Oper Res Soc 2021; 72(9): 2073–2090.

Boukhelifa

Bezerianos

Cristian Trelea

, et al. An exploratory study on visual exploration of model simulations by multiple types of experts. In: Proceedings of the annual ACM conference on human factors in computing systems, CHI ’19, 2019. https://doi.org/10.1145/3290605.

Chakhchoukh

Boukhelifa

Bezerianos

. Understanding how in-visualization provenance can support trade-off analysis. IEEE Trans Vis Comput Graph 2023; 29: 3758–3774. https://doi.org/10.1109/TVCG.2022.3171074

Alsallakh

Micallef

Aigner

, et al. The state-of-the-art of set visualization. In: Computer graphics forum. Wiley Online Library, 2016, pp.234–260, Vol. 35.

Tominski

Behrisch

Bleisch

, et al. Visualizing uncertainty in sets. IEEE Comput Graph Appl 2023; 43(5): 49–61.

Hwang

Lai

Liu

. A new approach for multiple objective decision making. Comput Oper Res 1993; 20(8): 889–899.

10.

Gleicher

. Considerations for visualizing comparison. IEEE Trans Vis Comput Graph 2018; 24(1): 413–423. https://doi.org/10.1109/TVCG.2017.2744199

11.

Daniels

Werner

Bahill

. Quantitative methods for tradeoff analyses. Syst Eng 2001; 4(3): 190–212.

12.

Pajer

Streit

Torsney-Weir

, et al. Weightlifter: visual weight space exploration for multi-criteria decision making. IEEE Trans Vis Comput Graph 2017; 23(1): 611–620. https://doi.org/10.1109/TVCG.2016.2598589

13.

Kehrer

Piringer

Berger

, et al. A model for structure-based comparison of many categories in small-multiple displays. IEEE Trans Vis Comput Graph 2013; 19(12): 2287–2296.

14.

Milutinović

Ahonen-Jonnarth

Seipel

, et al. The impact of interactive visualization on trade-off-based geospatial decision-making. Int J Geogr Inf Sci 2019; 33(10): 2094–2123.

15.

Gratzl

Lex

Gehlenborg

, et al. Lineup: visual analysis of multi-attribute rankings. IEEE Trans Vis Comput Graph 2013; 19(12): 2277–2286. https://doi.org/10.1109/TVCG.2013.173

16.

Gratzl

Gehlenborg

Lex

, et al. Domino: extracting, comparing, and manipulating subsets across multiple tabular datasets. IEEE Trans Vis Comput Graph 2014; 20(12): 2023–2032.

17.

Wall

Das

Chawla

, et al. Podium: ranking data using mixed-initiative visual analytics. IEEE Trans Vis Comput Graph 2018; 24(1): 288–297.

18.

Zhao

Cui

, et al. Skylens: visual analysis of skyline on multi-dimensional data. IEEE Trans Vis Comput Graph 2018; 24(1): 246–255.

19.

Lex

Gehlenborg

Strobelt

, et al. Upset: visualization of intersecting sets. IEEE Trans Vis Comput Graph 2014; 20(12): 1983–1992.

20.

Niederer

Stitz

Hourieh

, et al. Taco: visualizing changes in tables over time. IEEE Trans Vis Comput Graph 2018; 24(1): 677–686.

21.

Ibrahim

Poorthuis

, et al. Improving visualization design for effective multi-objective decision making. IEEE Trans Vis Comput Graph 2022; 28(10): 3405–3416.

22.

Oral

Chawla

Wijkstra

, et al. From information to choice: a critical inquiry into visualization tools for decision making. IEEE Trans Vis Comput Graph 2023; 30(1): 1–11.

23.

Keeney

Raiffa

. Decisions with multiple objectives: preferences and value trade-offs. Cambridge University Press, 1993.

24.

Triantaphyllou

. Multi-criteria decision making methods: a comparative study. Springer Science & Business Media, 2013. Vol. 44.

25.

Shih

Shyur

Lee

. An extension of topsis for group decision making. Math Comput Model 2007; 45: 801–813.

26.

Gregory

Keeney

. Making smarter environmental management decisions. J Am Water Resour Assoc 2002; 38(6): 1601–1612.

27.

Marttunen

Lienert

Belton

. Structuring problems for multi-criteria decision analysis in practice: A literature review of method combinations. Eur J Oper Res 2017; 263(1): 1–17.

28.

Roy

. The outranking approach and the foundations of electre methods. Theory Decis 1991; 31(1): 49–73.

29.

Javed

Elmqvist

. Exploring the design space of composite visualization. In: Proceedings of the 2012 IEEE Pacific visualization symposium, pp. 1–8. IEEE.

30.

Correll

Bailey

Sarikaya

, et al. Layercake: a tool for the visual comparison of viral deep sequencing data. Bioinformatics 2015; 31(21): 3522–3528.

31.

Zhang

Cheng

, et al. Paretolens: a visual analytics framework for exploring solution sets of multi-objective evolutionary algorithms [application notes]. IEEE Comput Intell Mag 2025; 20(1): 78–94.

32.

Blumenschein

Behrisch

Schmid

, et al. Smartexplore: simplifying high-dimensional data analysis through a table-based visual analytics approach. In: Proceedings of the 2018 IEEE conference on visual analytics science and technology (VAST), pp. 36–47. IEEE.

33.

Zhao

Karimzadeh

Snyder

, et al. Metricsvis: a visual analytics system for evaluating employee performance in public safety agencies. IEEE Trans Vis Comput Graph 2020; 26(1): 1193–1203.

34.

Dimara

Bezerianos

Dragicevic

. Conceptual and methodological issues in evaluating multidimensional visualizations for decision support. IEEE Trans Vis Comput Graph 2018; 24(1): 749–759. https://doi.org/10.1109/TVCG.2017.2745138

35.

Clarke

Braun

. Thematic analysis: a practical guide. Sage Publications Ltd, 2021.

36.

Craigslist cars and trucks data, https://www.kaggle.com/datasets/prena0808/craigslist-cars-and-trucks-data (2024, accessed 23 February 2024).

37.

API Ninjas. Cars API documentation, https://api-ninjas.com/api/cars (2024, accessed 23 February 2024).

38.

OApackage. Pareto example documentation, https://oapackage.readthedocs.io/en/latest/examples/example_pareto.html. (2018, accessed 01 January 2024)

39.

Data to Viz. Data to viz. https://www.data-to-viz.com (2023, accessed 03 October 2023)

40.

Perin

Dragicevic

Fekete

. Revisiting bertin matrices: new interactions for crafting tabular visualizations. IEEE Trans Vis Comput Graph 2014; 20(12): 2082–2091.

41.

Perin

Nacenta

. The effect of visual aids on reading numeric data tables. IEEE Trans Vis Comput Graph 2025; 31(1): 995–1005. https://doi.org/10.1109/TVCG.2024.3456403

42.

Furmanova

Gratzl

Stitz

, et al. Taggle: combining overview and details in tabular data visualizations. Inf Vis 2020; 19(2): 114–136. https://doi.org/10.1177/1473871619878085

43.

Unal

Warn

Simpson

. Quantifying tradeoffs to reduce the dimensionality of complex design optimization problems and expedite trade space exploration. Struct Multidiscipl Optim 2016; 54(2): 233–248. https://doi.org/10.1007/s00158-015-1389-7

44.

Hart

Staveland

. Development of NASA-TLX (task load index): results of empirical and theoretical research. In: Hancock

Meshkati

(eds) Human mental workload, advances in psychology. North-Holland, 1988, pp.139–183, Vol. 52, https://doi.org/10.1016/S0166-4115(08)62386-9

45.

Nauta

Trienes

Pathak

, et al. From anecdotal evidence to quantitative evaluation methods: a systematic review on evaluating explainable AI. ACM Comput Surv 2023; 55(13s): 1–42. https://doi.org/10.1145/3583558

46.

Dragicevic

. Fair statistical communication in HCI. In: Robertson

Kaptein

(eds) Modern statistical methods for HCI. Springer, 2016, pp.291–330, https://doi.org/10.1007/978-3-319-26633-6\_13

47.

Ayres

Sweller

. The split-attention principle in multimedia learning. In: Mayer

Fiorella

(eds) The Cambridge handbook of multimedia learning. Cambridge University Press, 2005, pp.135–146, Vol. 2.

48.

Tversky

. Intransitivity of preferences. Psychol Rev 1969; 76(1): 31–48. https://doi.org/10.1037/h0026750

49.

Gigerenzer

Goldstein

. Reasoning the fast and frugal way: models of bounded rationality. Psychol Rev 1996; 103(4): 650–669.

50.

Szafir

Haroz

Gleicher

, et al. Four types of ensemble coding in data visualizations. J Vis 2016; 16(5): 11. https://doi.org/10.1167/16.5.11

51.

Bartram

Correll

Tory

. Untidy data: the unreasonable effectiveness of tables. IEEE Trans Vis Comput Graph 2022; 28(1): 686–696. https://doi.org/10.1109/TVCG.2021.3114830