The previous post, The Rise of the Touchscreen Interface, covered how radically new the touchscreen slab interface is. Then it went into some detail on the business implications of this new era. For this post, let’s return to defining eras of computing by basic interfaces. But instead of analyzing the current change, let’s speculate on what might come beyond the touchscreen.

Recall this computing interface progression:

  1. Batch interface – punch cards in, punch cards out
  2. Command-line user interface – type text in, get text out
  3. Graphical user interface – add graphical windows/mouse to #2
  4. Tablet interface – fingers on glass: swipe, pinch, tap

Note that computing interface era #4 above uses the word tablet in the broad sense to mean any slab with a touchscreen interface. So this includes iPhones, Android phones and tablets, iPads, Kindle Fires, etc. So with that framework, what’s next? Candidates include:

  1. Gesture control – as in the Kinect video game, or the movie Minority Report
  2. Augmented reality – overlay the world with annotation (Google glasses)
  3. Voice interaction – talk to your computer and it talks back (Apple Siri, Google voice)

Of these three, I think the one making the most disruptive immediate impact will be voice interaction. We can go back to Joseph Weizenbaum’s 1966 program ELIZA to understand why. Eliza was one of the first chatbots, so a sample dialog with a person might go as below. With the real person typing in lower case, and the ELIZA chatbot responding in upper case.

Person: Men are all alike 
Person:  They’re always bugging us about something or other 
Person:  Well, my boyfriend made me come here 
Person:  He says I’m depressed much of the time 

What’s fascinating about Weizenbaum is that he found one of his assistants pouring her heart out to ELIZA late one night, and it disturbed him so much that a real human could pour their heart out to a machine that he began a crusade against AI that lasted the rest of his life. He went overboard of course, but Weizenbaum was on to something. Humans have a hard wired capacity to interact with others as social beings, and that capacity is so automatic and innate that it leaps into action even with a machine whether we want it to or not.

So my prediction is that the next real leap in human/computer user interaction is that tablets and phones will learn to talk. And just as the graphical interface and mouse neatly layered on top of the older command line text interface without much disruption, so will voice interaction perfectly overlay the existing tablet-phone user interface. A perfect fit. And if this analogy holds, the companies leading the tablet market today are well positioned to jump across this transition, just as Microsoft was able to jump across from command line DOS to the newer graphical interface market. This means not only are Google Android and Apple iOS going to continue to dominate in the phone-tablet space, they are also like to lead in the voice interaction era to come, which will be bigger than ever.

I would need another full post to justify why we’ll see such rapid progress in voice interaction. So instead I’ll use an analogy with computer chess to show it’s at least plausible. With chess, progress seemed overblown and halting until computers got close to human level, at which point it was suddenly miraculous. Today’s chess computers are not just a tad better than humans like Deep Blue was when it beat Kasparov in 1997, but way better. That’s true even on cheap hardware. This progression is characteristic of exponential progress. From any fixed vantage point, in this case human chess master, the progress leading up to that point seems pitifully slow even though it’s growing exponentially. So going from 1/8 human level to 1/4 human level might take 10 years, and then we wait another 10 years to go from 1/4 to 1/2. But once you hit human capability, the next 10 years yeras takes you 2x above, and another decade takes you 4x above. From a fixed human vantage point, the gap becomes explosive once it hits your level.

If this analogy to chess is correct, and if you accept the premise that computer chess itself showed this exponential pattern, then progress will become explosive from our fixed vantage point once we get close to talking to computers. And we are close now. Siri and Google voice are actually useful already for limited applications. For example I find it’s faster to speak some queries on my phone rather than to type them, though that is partially due to the limitations of a touchscreen interface. But that’s secondary. The primary point is that commercial and useful speech interaction is already in the market now, which means we’ve hit close to human level. So we should expect that over the next decade or two the progress will suddenly go from (apparently) halting to spectacular. And just as people today stare at their phones as they wander down the street, soon we’ll see people with earbuds talking to their phones as they wander the street, more oblivious than ever. And like ELIZA from the 1960’s we’ll find ourselves unable to resist talking to our phones as if they were human, or else we’ll be completely freaked out. Probably both.

The next leap in human computer interface will be a doozy.

