Performing the Wizards of Oz

The Wizard of Oz experiment (WOz) is a research approach in which an intelligent system is presented to users, typically as part of a research study. Unbeknownst to the user, the presented intelligence is a mirage, with the gubbins of the supposedly intelligent system run by a human operator pulling metaphorical levers. In other words, the intelligence is a fiction. In an article presented at ACM CSCW 2020, and due to be published in Proceedings of the ACM on Human-Computer Interaction, we take a look at our use of the method and unpack the interactional work that goes into pulling of the method. In other words, we pull back the curtain on the method. This blog post is a bit of a teaser, focusing solely on some of the elements of collaboration that we identified in the article. A PDF is available at the end of this post if you’d rather just jump into the article.

Alternatively, instead of (or in addition to) reading this blog post, you can also watch the presentation on YouTube (it was a virtual conference for 2020 for obvious reasons). This presentation includes a short video clip from the data we collected if you want to get a feel for how the study unfolded.

Watch my presentation of the work at ACM CSCW 2020

As you can probably guess, the method’s name comes from the L Frank Baum novel The Wonderful Wizard of Oz. Early use of the method in HCI took less exciting names like ‘experimenter in the loop’¹. A WOz approach offers the ability to prototype and potentially validate—or not—design concepts through experimentation without the costly development time that a full system may require². Approaches have included simulating things such as a ‘Listening Typewriter’³ and public service information lookup for a telephone line⁴. In WOz, different elements may be simulated, ranging from database lookup through to mobile geolocation tracking⁵. Due to the recent commericalisation of voice recognition technologies, there is a plethora of literature using the approach for studies in voice interface design, with natural language processing being the simulated component. I’d guess that’s because building natural language interfaces is a costly endeavor (monetarily and timewise).

In our paper, we look at the use of a voice-controlled mobile robot for cleaning, where we simulated the natural language processing of the voice instruction, and conversion of this into an instruction to a robot (i.e. the Wizard listened to requests and controlled the robot). We were running this study as part of a language ellicitation study, although that’s really the focus of the paper. Cruically our study required two researchers to operate the proceedings: one scaffolded the participant interaction and the other performed the work of the ‘Wizard’, responding to participants’ requests and controlling the vacuum.

Collaboration was key

In the paper we go into much more detail, focusing on the various aspects needed to pull off such a study, starting with the how the ‘fiction’ of the voice-controlled robot is established and presented to users, through to how the researchers running the study attend to a technical breakdown while running the study. We progressively establish the fiction as an interactional accomplishment between all three interactants (i.e. the two researchers and the participant).

A video still showing the laboratory where the elicitation study took place. The Wizard is shown in the top left of the room (A) and in the cut-out image from the opposite angle. itr{P11} (B) is standing with the researcher (C) looking towards the NEATO robots (D) and the three piles of debris (red ovals for clarity). — The ‘Wizard’ (A), a participant (B) with ‘the researcher’ (C), the NEATO robots (D), and three piles of debris to be cleaned (red ovals).

The researcher, who in our study stands with the participant, introduces the scenario, shows the robot to the participant, and guides them into instructing it (i.e. they scaffold the participant’s involvement in the study). The participant ostensibly talks to and responds to the vacuum. The Wizard—who is listening—responds to the request, in accordance with the fiction presented by the researcher and the notions of what a voice-controlled vacuum robot might reasonably respond to. It’s the Wizard whom the participant is really instructing in such a study (as the voice-controlled robot is but a fiction). The researcher standing with the participant then must performatively account for the actions taken by the Wizard according to that fiction. In other words, whatever ‘the robot does’, the researcher must attribute its actions to the robot to conceal the machinations of the Wizard.

There are other challenges, of course, that make this harder: the Wizard must respond to the participants’ requests in a way consistent with the fiction quickly and consistently in order to ensure the methodological validity of the study. We also discuss a situation in the article where there is a technical glitch with the robots, requiring both researchers to work together in an improvised manner to uphold the secrecy of the Wizard, while trying to collaboratively resolve the issues face.

A triangular flow diagram surrounded by a box labelled front stage. The flow diagram shows that the researcher instructs a participant. The participant has a greyed out arrow suggesting they instruct a vacuum. The researcher responds to the vacuum, who has a greyed out arrow suggesting it responds to and is responded to by the participant, who responds to and is responded to by the researcher. The researcher never instructs the vacuum nor is does the vacuum respond to the researcher. There are three additional arrows, and a fourth interactant, the Wizard, whom is outside the front stage. The participant is shown to instruct the Wizard, who instructs the vacuum (which responds to the Wizard). — The ‘triad of fiction’ from the paper.

Given the dramatic naming of the approach, we describe this accomplishment as a triad of fiction, taking place on the ‘front stage’ (with the Wizard working ‘backstage’). Around the same time, others also referred to this as ‘front channel’ and ‘back channel’ communication⁶. See the figure for how we pictorially represent the communication between the various interactants in our study.

Practical takeaways

Above I’ve focused on the collaboration required to pull of the study, we also devote a fair chunk of the article to detailing the practical steps we took in implementing the study design and running the study. With this, we discuss how we used various technologies, piecing them together to present a believable ‘voice-controlled robot’. We had a shared protocol document that both the researcher and the Wizard used to maintain awareness of each other’s actions and an outline script that detailed the sorts of requests that the robot would respond positively (or not) to, and this was progressively updated throughout the studies. While we frame running a WOz study as a performance, we were keen to stress the methodological obligations involved too: the performance must be undertaken according to methodologically valid research practice. We argue this requires meticulous care and attention, and that this is driven by the collaboration of the researchers throughout.

Paul Green and Lisa Wei-Haas. 1985. The Wizard of Oz: a tool for rapid development of user interfaces. Final report. Technical Report. University of Michigan ↩
Kevin F White and Wayne G Lutters. 2003. Behind the Curtain: Lessons Learned from a Wizard of Oz Field Experiment. SIGGROUP Bull. 24, 3 (Dec. 2003), 129–135. https://doi.org/10.1145/1052829.105285 ↩
John D Gould, John Conti, and Todd Hovanyecz. 1983. Composing Letters with a Simulated Listening Typewriter. Commun. ACM 26, 4 (April 1983), 295–308. https://doi.org/10.1145/2163.358100 ↩
Robin Wooffitt. 1994. Applying Sociology: Conversation Analysis in the Study of Human-(Simulated) Computer Interaction. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique 43, 1 (1994), 7–33. https://doi.org/10.1177/075910639404300103 ↩
Stephan Schlögl, Gavin Doherty, and Saturnino Luz. 2014. Wizard of Oz Experimentation for Language Technology Applications: Challenges and Tools. Interacting with Computers 27, 6 (may 2014), 592–615. https://doi.org/10.1093/iwc/iwu016 ↩
Nikolas Martelaro, Sarah Mennicken, Jennifer Thom, Henriette Cramer, and Wendy Ju. 2020. Using Remote Controlled Speech Agents to Explore Music Experience in Context. In Proceedings of the 2020 ACM Designing Interactive Systems Conference (DIS ’20). ACM, New York, NY, USA, 2065–2076. https://doi.org/10.1145/3357236.3395440 ↩