This guest post is by Dr Ben Kraal, who is a Research Fellow in the School of Design at Queensland University of Technology. At the moment he mostly works on the Human Systems part of the Airports of the Future project.
I came to my PhD, back in 2002, with the idea that it would be in human-computer interaction (HCI) and that it would involve developing some sort of prototype software. By circumstance I had a supervisor who was a speech recognition engineer, so my software had to be speech recognition software.
I built a very basic piece of software that seemed to do more recognition of speech than it actually did because I did a few tricks with it. Because I had built the software to recognise the summation speech of a magistrate, and because I’d been given a few transcripts, I’d figured out the basic pattern of what was said – so rather than build the software to recognise every word that was said, I’d built the software around a grammar that knew which words mattered in what was said.
But, I’d come to realise that the tricks I’d used to make the software seem like it worked made it incredibly brittle. In my cleverness, what I’d intended to be a feature of my prototype software, its “naturalness”, was actually a bug. That is, I’d got myself into a fine mess.
It wasn’t a matter of building the software differently. I wasn’t certain that I knew enough about how people used speech recognition software to design and build it effectively. Testing the software with people who didn’t use speech recognition software regularly showed that they’d massively over-estimate its very meagre capabilities. But, to make it more robust, the software would have to be much more rigid in how it “listened” to what was said. This went against all of what I knew about making software for people. In order to reconcile robust but rigid speech recognition software with usable speech recognition software, I needed to find people who did use it regularly and try to find out how they understood it and used it.
But how? My tradition of HCI was one of lab tests and statistics. I didn’t have the tools to answer the question I’d asked. It would be necessary to find real live users and understand their work. This was very unusual in the speech recognition literature.
Finding people who used speech recognition software everyday proved quite difficult. An email to the whole university community didn’t turn up anyone. A great deal of searching led the Hansard section of the Parliamentary reporting service who used speech recognition software to help in transcribing what was said in the House, Senate and Committees. The Hansard editors actually re-speak what is said to get the parliamentarians speech into text, before editing it with keyboard and mouse. A messy process but apparently cheaper than training people to be transcription-speed typists.
A little later I found someone who had a job as a speech recognition trainer. She put my call for participants in her newsletter and I got several more participants that way. All of the interviewees had some sort of overuse injury, like RSI, that prevented them from using a keyboard and mouse but they were otherwise able-bodied. I was able to talk myself into most of their workplaces and interview them and observe the great diversity of how they worked.
By this stage I had picked up a third, sociologist, PhD supervisor in addition to my speech recognition engineer and my HCI specialist. I tutored a 3rd-year sociology course under her close supervision so I could better understand the various theories and approached I’d need to understand my data.
By the time I’d finished my PhD I had extensive data from two different user populations of speech recognition users, and another ethnography of potential speech recognition users to deal with. I tied all the different cases together with some HCI theory to explain the way that individuals used their computers and some social theory to help me explain how those individuals-and-computers fit into wider spheres of work.
Looking back, I’m pretty happy with the messy way it turned out. My main point was that using speech recognition software productively at work is actually a highly complex process which involves dealing with software, hardware and office politics besides.
So, rather than being a bug, the messiness in my PhD turned out to be a feature.
If there is a moral to Ben’s story it is that what might seem like a mess can actually be of benefit if you don’t panic and systematically sort out what is going on and why. Getting the right supervisory help is also important. And this kind of mess is perhaps exactly what a PhD ought to be about – not so much focused on a predetermined plan to follow, but rather an exploration of a topic which may go to plan – or to somewhere entirely surprising.
Ben has a blog too – it’s called not easily obvious – why not check out what else he has to say?