forked from HKUSTDial/nvBench-2.0
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
461 lines (431 loc) · 22.3 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="description" content="nvBench 2.0: A Benchmark for Natural Language to Visualization under Ambiguity">
<meta name="keywords" content="nvBench, NL2VIS, Natural Language, Visualization, Ambiguity">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>nvBench 2.0: A Benchmark for Natural Language to Visualization under Ambiguity</title>
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
<link rel="stylesheet" href="./static/css/bulma.min.css">
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
<link rel="stylesheet" href="./static/css/bulma-slider.min.css">
<link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
<link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
<link rel="stylesheet" href="./static/css/index.css">
<link rel="icon" href="./static/images/diallab.svg">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script defer src="./static/js/fontawesome.all.min.js"></script>
<script src="./static/js/bulma-carousel.min.js"></script>
<script src="./static/js/bulma-slider.min.js"></script>
<script src="./static/js/index.js"></script>
<style>
.NL2VIS {
font-weight: bold;
color: #2980B9;
}
.teaser img {
width: 100%;
max-width: 800px;
margin: 0 auto;
}
.hero.is-light {
background-color: #f4f8fd;
}
.hero.teaser {
background-color: #ffffff;
}
.paper-figure {
margin-bottom: 2rem;
}
.figure-container {
display: flex;
justify-content: center;
align-items: flex-start;
flex-wrap: wrap;
gap: 20px;
margin-bottom: 2rem;
}
.figure-vertical {
display: flex;
flex-direction: row;
justify-content: space-between;
align-items: flex-start;
margin-bottom: 2rem;
}
.figure-vertical img {
max-width: 48%;
}
.figure-vertical .figure-text {
max-width: 48%;
}
.figure-caption {
text-align: center;
font-weight: bold;
margin-top: 0.5rem;
}
.figure-description {
margin-top: 0.5rem;
text-align: justify;
font-style: italic;
font-size: 0.9rem;
}
table {
width: 100%;
margin-bottom: 2rem;
}
.tables-container {
display: flex;
justify-content: space-between;
flex-wrap: wrap;
margin-bottom: 2rem;
}
.table-container {
width: 48%;
}
.math-formula {
background-color: #f5f5f5;
padding: 1rem;
border-radius: 4px;
margin: 1rem 0;
overflow-x: auto;
}
.citation {
background-color: #f5f5f5;
padding: 1.5rem;
border-radius: 4px;
margin: 1.5rem 0;
font-family: 'Courier New', monospace;
white-space: pre-wrap;
overflow-x: auto;
}
</style>
</head>
<body>
<nav class="navbar" role="navigation" aria-label="main navigation">
<div class="navbar-brand">
<a role="button" class="navbar-burger" aria-label="menu" aria-expanded="false">
<span aria-hidden="true"></span>
<span aria-hidden="true"></span>
<span aria-hidden="true"></span>
</a>
</div>
<div class="navbar-menu">
<div class="navbar-start" style="flex-grow: 1; justify-content: center;">
<a class="navbar-item" href="https://github.com/HKUSTDial?q=nvbench&type=all&language=&sort=">
<img src="./static/images/diallab.svg" alt="DIAL Lab Logo" width="28" height="28">
<span style="margin-left: 5px;">DIAL Lab</span>
</a>
<div class="navbar-item has-dropdown is-hoverable">
<a class="navbar-link">
More Research
</a>
<div class="navbar-dropdown">
<a class="navbar-item" href="https://nvbench2.github.io">
nvBench 2.0
</a>
<a class="navbar-item" href="https://github.com/TsinghuaDatabaseGroup/nvBench">
nvBench 1.0
</a>
</div>
</div>
</div>
</div>
</nav>
<section class="hero">
<div class="hero-body">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column has-text-centered">
<h1 class="title is-1 publication-title">nvBench 2.0: A Benchmark for Natural Language to Visualization under Ambiguity</h1>
<div class="is-size-5 publication-authors">
<span class="author-block">
<a href="#">Tianqi Luo</a><sup>1</sup>,</span>
<span class="author-block">
<a href="#">Chuhan Huang</a><sup>1</sup>,</span>
<span class="author-block">
<a href="#">Leixian Shen</a><sup>2</sup>,
</span>
<span class="author-block">
<a href="#">Boyan Li</a><sup>1</sup>,
</span>
<span class="author-block">
<a href="#">Shuyu Shen</a><sup>1</sup>,
</span>
<span class="author-block">
<a href="#">Wei Zeng</a><sup>1</sup>,
</span>
<span class="author-block">
<a href="#">Nan Tang</a><sup>1</sup>,
</span>
<span class="author-block">
<a href="#">Yuyu Luo</a><sup>1</sup>
</span>
</div>
<div class="is-size-5 publication-authors">
<span class="author-block"><sup>1</sup>The Hong Kong University of Science and Technology (Guangzhou),</span>
<span class="author-block"><sup>2</sup>The Hong Kong University of Science and Technology</span>
</div>
<div class="column has-text-centered">
<div class="publication-links">
<!-- PDF Link. -->
<span class="link-block">
<a href="https://raw.githubusercontent.com/nvBench2/nvBench2.github.io/refs/heads/main/nvBench2.0.pdf"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fas fa-file-pdf"></i>
</span>
<span>Paper</span>
</a>
</span>
<!-- <span class="link-block">
<a href="#"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="ai ai-arxiv"></i>
</span>
<span>arXiv</span>
</a>
</span> -->
<!-- Code Link. -->
<span class="link-block">
<a href="https://github.com/HKUSTDial/nvBench2.github.io"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fab fa-github"></i>
</span>
<span>Code & Data</span>
</a>
</span>
</div>
</div>
</div>
</div>
</div>
</div>
</section>
<section class="section">
<div class="container is-max-desktop">
<!-- Abstract. -->
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-3">Abstract</h2>
<div class="content has-text-justified">
<p>
Natural Language to Visualization (NL2VIS) enables users to create visualizations from natural language queries, making data insights more accessible. However, NL2VIS faces challenges in interpreting ambiguous queries, as users often express their visualization needs in imprecise language.
</p>
<p>
To address this challenge, we introduce nvBench 2.0, a new benchmark designed to evaluate NL2VIS systems in scenarios involving ambiguous queries. nvBench 2.0 includes 7,878 natural language queries and 24,076 corresponding visualizations, derived from 780 tables across 153 domains. It is built using a controlled ambiguity-injection pipeline that generates ambiguous queries through a reverse-generation workflow. By starting with unambiguous seed visualizations and selectively injecting ambiguities, the pipeline yields multiple valid interpretations for each query, with each ambiguous query traceable to its corresponding visualization through step-wise reasoning paths.
</p>
<p>
We evaluate various Large Language Models (LLMs) on their ability to perform ambiguous NL2VIS tasks using nvBench 2.0. We also propose Step-NL2VIS, an LLM-based model trained on nvBench 2.0, which enhances performance in ambiguous scenarios through step-wise preference optimization. Our results show that Step-NL2VIS outperforms all baselines, setting a new state-of-the-art for ambiguous NL2VIS tasks.
</p>
</div>
</div>
</div>
</div>
</section>
<section class="section">
<div class="container is-max-desktop">
<!-- Step-wise Disambiguation (Figure 1) -->
<div class="figure-vertical">
<img src="./static/images/fig1.svg" alt="Example of reasoning appropriate visualizations from an ambiguous natural language query">
<div class="figure-text">
<h2 class="title is-3">Step-wise Disambiguation</h2>
<div class="content">
<p>
When resolving ambiguities in natural language queries, we employ a step-wise reasoning approach that mimics human decision-making processes. This approach involves:
</p>
<ol>
<li><strong>Data Selection Reasoning:</strong> Identifying relevant data columns and filters from the query</li>
<li><strong>Chart Type Reasoning:</strong> Determining appropriate visualization types based on analytical tasks</li>
<li><strong>Channel Mapping Reasoning:</strong> Assigning data elements to visual channels</li>
<li><strong>Data Transformation Reasoning:</strong> Specifying operations like aggregation or filtering</li>
<li><strong>Visualization Synthesis:</strong> Generating complete visualizations that represent valid interpretations</li>
</ol>
<p>
This structured approach enables systematic resolution of ambiguities while preserving multiple valid interpretations of the original query.
</p>
</div>
<p class="figure-caption">Figure 1: Example of reasoning appropriate visualizations from an ambiguous natural language query</p>
<p class="figure-description">
As shown in Figure 1, a seemingly straightforward query like "Show the gross trend of comedy and action movies by year" contains multiple ambiguities: "gross" could refer to either World_Gross or Local_Gross columns, "Comedy and action" implicitly requires filtering by Genre, "trend" may suggest a bar chart or line chart, and "By year" implies temporal binning that isn't explicitly defined. The figure illustrates how these ambiguities can be resolved through step-wise reasoning to produce multiple valid visualizations.
</p>
</div>
</div>
<!-- Ambiguity-Injected Data Synthesizer Overview (Figure 2) -->
<div class="columns is-centered">
<div class="column">
<h2 class="title is-3">Ambiguity-Injected NL2VIS Data Synthesizer</h2>
<div class="content">
<p>
We developed a data synthesizer that systematically introduces ambiguity into seed visualizations. This approach ensures control over the types of ambiguity while maintaining meaningful, interpretable outputs.
</p>
</div>
<div class="paper-figure">
<img src="./static/images/fig2.svg" alt="An overview of ambiguity-injected NL2VIS data synthesizer">
<p class="figure-caption">Figure 2: An overview of ambiguity-injected NL2VIS data synthesizer.</p>
<p class="figure-description">
We developed an ambiguity-injected NL2VIS data synthesizer that systematically introduces controlled ambiguity into visualization specifications. As shown in Figure 2, our pipeline consists of: (a) Ambiguity-aware VIS Tree Synthesis that begins with seed visualizations and injects ambiguity nodes to create ambiguity-aware visualization trees, (b) VIS Synthesis that uses an ASP solver to resolve these trees into multiple valid visualizations, (c) NL Synthesis that generates ambiguous natural language queries corresponding to the multiple valid visualizations, and (d) Reasoning Path Synthesis that produces step-wise reasoning paths documenting how ambiguities are resolved.
</p>
</div>
</div>
</div>
<!-- Ambiguity Injection Process (Figure 3) -->
<div class="figure-vertical">
<img src="./static/images/fig3.svg" alt="Injecting ambiguities into a seed visualization">
<div class="figure-text">
<h2 class="title is-3">Ambiguity Injection Process</h2>
<div class="content">
<p>
Our ambiguity-injection process transforms seed visualizations into ambiguity-aware visualization trees. By selectively introducing ambiguity nodes, we create multiple valid interpretations of the same query.
</p>
<p>
As shown in the figure, we start with a seed chart and convert it to a visualization tree. Then, we inject ambiguities to create multiple possible interpretations. This ambiguity-aware tree can then be resolved in various ways, producing different valid visualizations for the same ambiguous query.
</p>
<p>
The process ensures traceability from query to visualization through explicit reasoning paths, enabling systematic evaluation of NL2VIS systems' ability to handle ambiguity.
</p>
</div>
<p class="figure-caption">Figure 3: Injecting ambiguities into a seed visualization</p>
<p class="figure-description">
Figure 3 demonstrates how we inject ambiguities into a seed visualization through a systematic process: (1) Starting with a seed chart (e.g., a bar chart showing gross by year), (2) Converting it to a seed visualization tree with explicit nodes, (3) Injecting ambiguity nodes (e.g., introducing a choice between Local_Gross and World_Gross), (4) Resolving the tree into multiple valid visualization specifications, and (5) Flattening the trees into concrete visualization queries.
</p>
</div>
</div>
<!-- NL2VIS Benchmarks Comparison (Table 1) -->
<h2 class="title is-3">Benchmark Comparison</h2>
<div class="content">
<p>
nvBench 2.0 introduces several key innovations compared to existing NL2VIS benchmarks, particularly its explicit handling of query ambiguity and support for one-to-many mapping between queries and visualizations.
</p>
</div>
<div class="figure-container">
<img src="./static/images/table1.png" alt="Comparison of NL2VIS benchmarks">
<p class="figure-caption">Table 1: Comparison of NL2VIS benchmarks.</p>
<p class="figure-description">
nvBench 2.0 distinguishes itself from existing benchmarks by: supporting one-to-many mapping from NL queries to visualizations, explicitly modeling query ambiguity, providing reasoning paths to explain ambiguity resolution, and using LLM-based query generation for natural, diverse queries.
</p>
</div>
<!-- Benchmark Statistics -->
<h2 class="title is-3">Benchmark Statistics</h2>
<div class="content">
<p>
nvBench 2.0 includes a diverse range of natural language query styles and chart types, ensuring comprehensive coverage for evaluating NL2VIS systems.
</p>
</div>
<div class="figure-container">
<img src="./static/images/table3.png" alt="Distribution of natural language styles across chart types and word count statistics" style="width: 60%;">
<p class="figure-caption">Table 3: Distribution of natural language styles across chart types and word count statistics.</p>
<p class="figure-description">
The dataset includes diverse query styles (commands, questions, and captions) across various chart types. The average query length is approximately 14 words, with a good balance across all visualization types.
</p>
</div>
<div class="content">
<p>
nvBench 2.0 includes detailed statistics on ambiguity types and patterns, providing insights into the distribution and frequency of different ambiguity categories.
</p>
</div>
<div class="tables-container">
<div class="table-container">
<img src="./static/images/table4.png" alt="Ambiguity count at each reasoning step">
<p class="figure-caption">Table 4: Ambiguity count at each reasoning step.</p>
<p class="figure-description">
This table shows the distribution of ambiguities across different reasoning steps in the nvBench 2.0 dataset, highlighting which steps in the visualization process are most prone to ambiguity.
</p>
</div>
<div class="table-container">
<img src="./static/images/table5.png" alt="Statistics of ambiguity patterns">
<p class="figure-caption">Table 5: Statistics of ambiguity patterns.</p>
<p class="figure-description">
Our dataset contains diverse ambiguity patterns, with Channel Encoding (CE) being the most common type of ambiguity (88.06%), followed by Data Transformation (DT) ambiguities (46.00%). Many samples contain multiple types of ambiguity, highlighting the complexity of real-world visualization requests.
</p>
</div>
</div>
<!-- Step-NL2VIS Model -->
<h2 class="title is-3">Step-NL2VIS for Ambiguous NL2VIS</h2>
<div class="content">
<p>
We propose Step-NL2VIS, an LLM-based model trained on nvBench 2.0, which addresses ambiguity by incorporating a step-wise reasoning process and leveraging preference optimization.
</p>
<h3 class="title is-4">Preference Optimization with Step-DPO</h3>
<p>
Step-DPO utilizes step-wise paired correct and incorrect samples for preference optimization, delivering rich process supervision signals to the model and fostering improved accuracy at each step.
</p>
<div class="math-formula">
<p>L(θ) = -E<sub>(x,s<sub>1~k-1</sub>,s<sub>win</sub>,s<sub>lose</sub>)~D<sub>p</sub></sub>[
log σ(
β log π<sub>θ</sub>(s<sub>win</sub>|x, s<sub>1~k-1</sub>) / π<sub>ref</sub>(s<sub>win</sub>|x, s<sub>1~k-1</sub>)
- β log π<sub>θ</sub>(s<sub>lose</sub>|x, s<sub>1~k-1</sub>) / π<sub>ref</sub>(s<sub>lose</sub>|x, s<sub>1~k-1</sub>)
)
]</p>
</div>
<p>
Where D<sub>p</sub> represents a step-wise preference dataset, π<sub>θ</sub>(·|x, s<sub>1~k-1</sub>) denotes the policy model to be optimized, π<sub>ref</sub>(·|x, s<sub>1~k-1</sub>) refers to the reference model, and β controls the divergence between the optimized policy and the reference model.
</p>
</div>
<!-- Experimental Results -->
<h2 class="title is-3">Experiments</h2>
<div class="content">
<p>
We evaluate the performance of various models on the ambiguous NL2VIS task using nvBench 2.0, comparing our Step-NL2VIS model against state-of-the-art approaches.
</p>
<h3 class="title is-4">Overall Performance</h3>
<p>
The table below presents the comprehensive performance evaluation of different models on nvBench 2.0. Our proposed Step-NL2VIS achieves state-of-the-art performance across most metrics.
</p>
</div>
<div class="figure-container">
<img src="./static/images/table6.png" alt="Overall performance comparison between different models on nvBench 2.0">
<p class="figure-caption">Table 6: Overall performance comparison between different models on nvBench 2.0.</p>
<p class="figure-description">
Our proposed Step-NL2VIS achieves state-of-the-art performance across most metrics, significantly outperforming both prompting-based and fine-tuning-based baselines. Step-NL2VIS obtains the highest F1@3 (81.50%) and F1@5 (80.88%), demonstrating its superior ability to handle ambiguity in NL2VIS tasks.
</p>
</div>
<div class="figure-container">
<img src="./static/images/fig7.svg" alt="F1 across different models and ambiguity levels">
<p class="figure-caption">Figure 7: F1 across different models and ambiguity levels.</p>
<p class="figure-description">
The heatmap shows that Step-NL2VIS consistently outperforms other models across most chart types and ambiguity levels. Models incorporating step-wise reasoning generally show better performance than their direct prompting counterparts, confirming the effectiveness of decomposing complex visualization reasoning into explicit steps.
</p>
</div>
<div class="figure-container">
<img src="./static/images/fig8.svg" alt="Recall across different models and ambiguity levels">
<p class="figure-caption">Figure 8: Recall across different models and ambiguity levels.</p>
<p class="figure-description">
Step-NL2VIS demonstrates superior recall performance across all ambiguity levels examined. At ambiguity level 3, it achieves 83.3% recall, representing a significant improvement over comparative approaches. The performance advantage of Step-NL2VIS over alternative approaches expands with increasing ambiguity levels.
</p>
</div>
<!-- Citation Section -->
<h2 class="title is-3">Citation</h2>
<div class="content">
<p>
If you find nvBench 2.0 useful for your work, please cite:
</p>
<div class="citation">@article{luo2024nvbench2,
author = {Luo, Tianqi and Huang, Chuhan and Shen, Leixian and Li, Boyan and Shen, Shuyu and Zeng, Wei and Tang, Nan and Luo, Yuyu},
title = {nvBench 2.0: A Benchmark for Natural Language to Visualization under Ambiguity},
<!-- journal = {PVLDB}, -->
<!-- year = {2024}, -->
}</div>
</div>
<!-- License Section -->
<h2 class="title is-3">License</h2>
<div class="content">
<p>
This work is licensed under a <a href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.
</p>
</div>
</div>
</section>
<footer class="footer">
<div class="container">
<div class="content has-text-centered">
<p>
Website template borrowed from <a href="https://github.com/nerfies/nerfies.github.io">Nerfies</a>.
</p>
</div>
</div>
</footer>
</body>
</html>