Words of Estimative Correlation: Studying Verbalizations of Scatterplots
Multimodal approaches where interactive visualization and natural language are used in tandem are emerging as promising techniques for data analysis. A significant barrier in effectively designing such multimodal techniques is the lack of a systematic understanding of how people verbalize visual representations of data. Motivated by these gaps, this paper devises and applies a transferable, semi-automated methodology to systematically study the relation between visualization and natural language through two crowd-sourced experiments and natural language analysis. We describe these experiments, analyze the resulting corpus of utterances with natural language processing techniques and derive an empirically supported semantic lexicon for aligning visualizations and verbalizations of data. Our results indicate a wide range of vocabulary used to describe visualizations and led to a number of high level concepts to categorize the space of words and related utterances. We discuss how our findings can be used for natural language generation, also reflecting on the design of the experiments and the semi-automated methodology used in the analysis. We discuss further research directions and argue for a role for such multimodal experiments in advancing our understanding of how people work with visualizations and also data at large.
READ FULL TEXT