Back in November, I broke my wrist snowboarding.
I can use a couple of the fingers on my right hand now, so typing is getting a bit easier. Immediately after the surgery my fingers were essentially paralyzed, and the pain was so bad I was antically mixing my blood with percocet, to little effect. That got better after a few days, but two handed typing was impossible (and is still very difficult).
I hired a typist; a massively overqualified MIT student who can take dictation on ordinary emails, technical text, and code. I didn’t like the idea of letting him drive the machine while I hovered over his shoulder barking obscure commands to click this or type that in software I know better than my own bone structure, so I setup a shared desktop over VNC.
He sits at the other end of my desk on a separate computer while I conduct the machine with my left hand, jumping from mail to mail, opening buffers, reading web pages, and generally doing the interactive low-latency low-volume typing tasks myself. He can see everything I’m doing because my desktop is shared over the network. And when I need to enter a large block of text, well, I just start talking, he types, and the words appear on the screen.
If I don’t look up from the screen, I can pretend he’s not there and that I have the world’s most powerful speech recognition engine. So I have a sneak peek into what computers will be like when speech recognition works really well. And I humbly submit to you my comments.
First, speech recognition is really awesome. It’s not the speed. I can talk 2-3 times faster than I can type. Fully functional, I am the fastest typer you have ever encountered: faster than my typist by a fair bit, in any case. And so the limiting factor is his ability to type, and I’m not getting any bump in my ability to put words on the screen quickly.
No, the best thing I’ve noticed so far is that it’s really pleasant to dictate email. I can consult notes or documents or books while I compose my message without a context switch between real world and computer world. It gives me a chance to think through what I’m saying more than I am inclined to do when I’m rapidly typing the message into the computer. It slows down the pace of my thought from the otherwise frantic 130wpm speed of my typing to a more human level.
And, of course, it’s just plain neat.
My next observation is that voice recognition is much harder than any of us think. Yes, it’s context. The classic example of the difficulty of speech recognition:
It’s hard to recognize speech.
It’s hard to wreck a nice beach.
But this is just a corner case; context is a constant guide for our natural speech parsers for words like “in” and “on” spoken quickly. My voice recognition engine is attentive enough to read my emails and to learn words he’s never seen before, so that when I say them, he already knows how to spell and capitalize them. But he still makes mistakes, all the time. The other day I said, “I’ll ask around and see what people think,” and he typed “I’ll ask Rodney what people think.” But it’s not his fault. It’s hard to wreckanicebeach.
And then, when he does make a mistake, I can say, “no, I meant in,” or “that’s supposed to be a new sentence,” and he immediately understands what I’m talking about, because he understands the meaning of what he’s typing. With contemporary software, I’d have to say “BACK BACK BACK BACK PERIOD SPACE CAPITALIZE FORWARD FORWARD FORWARD FORWARD.” Or something to that effect.
And finally, I’ve noticed that my emails have gotten a lot nicer since there’s another person helping me type. I’m less abrupt with people, friendlier and more understanding. Having someone else watching over your shoulder while you go about your business makes you want to be a nicer person.
Anyway, these are some of my thoughts on using extremely high quality speech recognition, based on a simulated automation environment. It is fun to try technology years before it exists. I wonder if there are other things we can simulate like this?
Posted on 9 December 2004
- Leave a comment
- Subscribe with Google Reader
- Follow me on Twitter




1 comment