Dialog Topics and Papers
The following are the set of papers that will be the “jumping off point” for your research and presentations. You and a partner (or partners) will find two other papers that expand on the paper you are assigned in some way. You will be pulling the information from all 3 papers together and presenting the material to the class.
Look through the list below and then use this form to indicate your first, second, and third choices (please just indicate a total of 3).
There are some additional papers in the final section that provide some additional background help you understand the material. I’ll also be covering many of the key basic topics on Tuesday.
More details to direct your choice of additional papers and the content of the presentation are forthcoming.
Referring expressions/ Coreference Resolution
Strube, Michael, and Christoph Müller. “A machine learning approach to pronoun resolution in spoken dialogue.” Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1. Association for Computational Linguistics, 2003.
We apply a decision tree based approach to pronoun resolution in spoken dialogue. Our system deals with pronouns with NP- and non-NP-antecedents. We present a set of features designed for pronoun resolution in spoken dialogue and determine the most promising features. We evaluate the system on twenty Switchboard dialogues and show that it compares well to Byron’s (2002) manually tuned system.
Celikyilmaz, Asli, et al. “Resolving referring expressions in conversational dialogs for natural user interfaces.” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014.
Unlike traditional over-the-phone spoken dialog systems (SDSs), modern dialog systems tend to have visual rendering on the device screen as an additional modality to communicate the system’s response to the user. Visual display of the system’s response not only changes human behavior when interacting with devices, but also creates new research areas in SDSs. On- screen item identification and resolution in utterances is one critical problem to achieve a natural and accurate human- machine communication. We pose the problem as a classification task to correctly identify intended on-screen item(s) from user utterances. Using syntactic, semantic as well as context features from the display screen, our model can resolve different types of referring expressions with up to 90% accuracy. In the experiments we also show that the proposed model is robust to domain and screen layout changes.
Generating Referring Expressions
Dale, Robert, and Nicholas Haddock. “Generating referring expressions involving relations.” Proceedings of the fifth conference on European chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 1991.
In this paper, we review Dale’s  algorithm for determining the content of a referring expression. The algorithm, which only permits the use of one-place predicates, is revised and extended to deal with n-ary predicates. We investigate the problem of blocking ‘recursion’ in complex noun phrases and propose a solution in the context of our algorithm.
Core, Mark G., and James Allen. “Coding dialogs with the DAMSL annotation scheme.” AAAI fall symposium on communicative action in humans and machines. Vol. 56. 1997.
This paper describes the DAMSL annotation scheme for communicative acts in dialog. The scheme has three layers: Forward Communicative Functions, Backward Communicative Functions, and Utterance Features. Each layer allows multiple communication functions of an utterance to be labeled. The Forward Communicative Functions consist of a taxonomy in a similar style as the actions of traditional speech act theory. The Backward Communicative Functions indicate how the current utterance relates to the previous dialog such as accepting a proposal confirming understanding or answering a question. The Utterance Features include information about an utterance’s form and content such as whether an utterance concerns the communication process itself or deals with the subject at hand. The kappa interannotator reliability scores for the first test of DAMSL with human annotators show promise but are on average lower than the accepted kappa scores for such annotations. However the slight revisions to DAMSL discussed should increase accuracy on the next set of tests and produce a reliable, flexible. and comprehensive utterance annotation scheme.
Stent, Amanda J., and Srinivas Bangalore. “Interaction between dialog structure and coreference resolution.” Spoken Language Technology Workshop (SLT), 2010 IEEE. IEEE, 2010.
Determining the coreference of entity mentions in a discourse is a key part of the interpretation process for advanced spoken dialog applications. In this paper, we present the most comprehensive system for statistical coreference resolution in dialog to date. We also com- pare the impact of two contrasting theories of dialog structure (the stack model and the cache model) on the performance of statistical coreference resolution, and show that the stack model outperforms the cache model.
Dialog State Tracking
Williams, Jason, Antoine Raux, and Matthew Henderson. “The dialog state tracking challenge series: A review.” Dialogue & Discourse 7.3 (2016): 4-33.
In a spoken dialog system, dialog state tracking refers to the task of correctly inferring the state of the conversation – such as the user’s goal – given all of the dialog history up to that turn. Dialog state tracking is crucial to the success of a dialog system, yet until recently there were no common resources, hampering progress. The Dialog State Tracking Challenge series of 3 tasks introduced the first shared testbed and evaluation metrics for dialog state tracking, and has underpinned three key advances in dialog state tracking: the move from generative to discriminative models; the adoption of discriminative sequential techniques; and the incorporation of the speech recognition results directly into the dialog state tracker. This paper reviews this research area, covering both the challenge tasks themselves and summarizing the work they have enabled.
Lee, Sungjin, and Amanda Stent. “Task lineages: Dialog state tracking for flexible interaction.” Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue. 2016.
We consider the gap between user demands for seamless handling of complex interactions, and recent advances in dialog state tracking technologies. We propose a new statistical approach, Task Lineage-based Dialog State Tracking (TL-DST), aimed at seamlessly orchestrating multiple tasks with complex goals across multiple do- mains in continuous interaction. TL-DST consists of three components: (1) task frame parsing, (2) context fetching and (3) task state update (for which TL-DST takes advantage of previous work in dialog state tracking). There is at present very little publicly available multi-task, complex goal dialog data; however, as a proof of concept, we applied TL-DST to the Dialog State Tracking Challenge (DSTC) 2 data, resulting in state-of-the-art performance. TL- DST also outperforms the DSTC baseline tracker on a set of pseudo-real datasets involving multiple tasks with complex goals which were synthesized using DSTC3 data.
Johnston, Michael, et al. “MATCH: An architecture for multimodal dialogue systems.” Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2002.
Mobile interfaces need to allow the user and system to adapt their choice of communication modes according to user preferences, the task at hand, and the physical and social environment. We describe a multimodal application architecture which combines finite-state multimodal language processing, a speech-act based multimodal dialogue manager, dynamic multimodal output generation, and user-tailored text planning to enable rapid prototyping of multimodal interfaces with flexible input and adaptive output. Our testbed application MATCH (Multimodal Access To City Help) provides a mobile multimodal speech-pen interface to restaurant and sub- way information for New York City.
Error Correction & Clarification
Stoyanchev, Svetlana, Alex Liu, and Julia Hirschberg. “Towards natural clarification questions in dialogue systems.” AISB symposium on questions, discourse and dialogue. Vol. 20. 2014.
Clarifications are often necessary for maintaining human-human as well as human-machine dialogue. However, clarification questions asked by Spoken Dialogue Systems (SDS) are very different from clarification questions asked in natural human interaction. While in human-human dialogues, speakers ask targeted questions using contextual information, SDS ask generic clarifications such as please repeat or please rephrase. We propose and evaluate a new strategy for creating more natural clarification questions. We model natural language clarification question generation rules based on human-generated behavior. We describe results of a user study to evaluate our automatically generated questions and show that subjective scores of the automatically generated questions are comparable to scores for human-generated questions — with some of the automatic rules even outperforming human-generated questions.
Rieser, Verena, and Johanna D. Moore. “Implications for generating clarification requests in task-oriented dialogues.” Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2005.
Clarification requests (CRs) in conversation ensure and maintain mutual understanding and thus play a crucial role in robust dialogue interaction. In this paper, we describe a corpus study of CRs in task-oriented dialogue and compare our findings to those reported in two prior studies. We find that CR behavior in task-oriented dialogue differs significantly from that in everyday conversation in a number of ways. Moreover, the dialogue type, the modality and the channel quality all influence the decision of when to clarify and at which level of the groun ing process. Finally we identify form- function correlations which can inform the generation of CRs.
Dialog System Architectures: POMDPs
Young, Steve, et al. “Pomdp-based statistical spoken dialog systems: A review.” Proceedings of the IEEE 101.5 (2013): 1160-1179.
Statistical dialog systems (SDSs) are motivated by the need for a data-driven framework that reduces the cost of laboriously handcrafting complex dialog managers and that provides robustness against the errors created by speech re- cognizers operating in noisy environments. By including an explicit Bayesian model of uncertainty and by optimizing the policy via a reward-driven process, partially observable Markov decision processes (POMDPs) provide such a frame- work. However, exact model representation and optimization is computationally intractable. Hence, the practical application of POMDP-based systems requires efficient algorithms and carefully constructed approximations. This review article pro- vides an overview of the current state of the art in the development of POMDP-based spoken dialog systems.
Dialog System Architectures: Ravenclaw
Bohus, Dan, and Alexander I. Rudnicky. “RavenClaw: Dialog management using hierarchical task decomposition and an expectation agenda.” Eighth European Conference on Speech Communication and Technology. 2003.
We describe RavenClaw, a new dialog management framework developed as a successor to the Agenda  architecture used in the CMU Communicator. RavenClaw introduces a clear separation between task and discourse behavior specification, and allows rapid development of dialog management components for spoken dialog systems operating in complex, goal-oriented domains. The system development effort is focused entirely on the specification of the dialog task, while a rich set of domain-independent conversational behaviors are transparently generated by the dialog engine. To date, RavenClaw has been applied to five different domains allowing us to draw some preliminary conclusions as to the generality of the approach. We briefly describe our experience in developing these systems.
Stent, Amanda, Rashmi Prasad, and Marilyn Walker. “Trainable sentence planning for complex information presentation in spoken dialog systems.” Proceedings of the 42nd annual meeting on association for computational linguistics. Association for Computational Linguistics, 2004.
A challenging problem for spoken dialog systems is the design of utterance generation modules that are fast, flexible and general, yet produce high quality output in particular domains. A promising approach is trainable generation, which uses general-purpose linguistic knowledge automatically adapted to the application domain. This paper presents a trainable sentence planner for the MATCH dialog system. We show that trainable sentence planning can produce output comparable to that of MATCH’s template-based generator even for quite complex information presentations.
Lee, Cheongjae, et al. “Recent Approaches to Dialog Management for Spoken Dialog Systems.” Journal of Computing Science and Engineering 4.1 (2010): 1-22.
Webber, Bonnie, and Aravind Joshi. “Discourse structure and computation: past, present and future.” Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries. Association for Computational Linguistics, 2012.
Grosz, Barbara J., Scott Weinstein, and Aravind K. Joshi. “Centering: A framework for modeling the local coherence of discourse.” Computational linguistics 21.2 (1995): 203-225.
Sukthanker, Rhea, et al. “Anaphora and Coreference Resolution: A Review.” arXiv preprint arXiv:1805.11824 (2018).
Stolcke, Andreas, et al. “Dialogue act modeling for automatic tagging and recognition of conversational speech.” Computational linguistics 26.3 (2000): 339-373.
Turk, Matthew. “Multimodal interaction: A review.” Pattern Recognition Letters 36 (2014): 189-195.