Betting Against Convergence

By Arnold Kling - December 23, 2002 12:00 AM

For the past decade, pundits have been predicting media convergence. As Nicholas Negroponte wrote in Being Digital, we can represent audio, video, and text as bits. Therefore, the convergence hypothesis goes, we do not need devices that specialize in one type of media. Instead, we will have "converged," multipurpose devices.

I continue to bet against this vision of convergence. My thinking is that while technology certainly might evolve in that direction, people cannot. In my crude assessment of human evolution, we still prefer to specialize in three modes of interaction with media devices:

Mode of interaction Examples
Eye-Hand personal computer; reading; writing
Speak-Listen telephone
Background-Noise TV; radio

People tend to reject new modes of interaction with devices. A new mode of interaction infringes on the user experience. In contrast, improvements to existing modes of interaction enhance the user experience.


There are a number of capabilities that are not popular because they represent infringements. A classic example is the screen phone, which has been available in one form or another for nearly forty years without achieving market acceptance. The reason for this, in my opinion, is that it infringes on the eye-hand mode, which ordinarily is left free when you are on the phone. When people are on the phone, they want their hands free to cook dinner, or drive, or scan their email. Watching someone on a screen would preclude doing those things.

Intelligent television enhancements, such as Replay TV or Tivo, are popular with "serious" television viewers. However, in most households, the television is background noise. Paying close attention to a television and/or interacting with it constantly would infringe on the conversations that people might be having in speak-listen mode or on any number of eye-hand activities, such as school work, household chores, or snacking.

A computer with speech recognition represents an attempt to introduce a speak-listen mode to an eye-hand device. Advocates of the technology can point out that computers are getting better at it. However, no matter how good the computer software becomes, my sense is that humans are not wired for a voice-screen interface. We prefer an interface that is based on hand-eye co-ordination. For most of us, voice recognition is an infringement, not an enhancement.


A more plausible use for speech recognition would be a "translating telephone," which would allow me to speak in English and be heard in another language. This use for speech recognition would be an extension of the speak-listen mode. (Incidentally, do not credit me with the concept of a translating telephone. For example, I have seen it described in Ray Kurzweil's The Age of Spiritual Machines.)

In general, I believe that there are many ways to design intelligent headsets to enhance the speak-listen mode. However, a challenge is to implement a way to control the headset that does not infringe as heavily on the hand-eye function as the controls I once suggested. Voice-activated controls would be best.

For the hand-eye mode, there is a conflict between portability and usability. Our hands need space to operate, and our eyes need large displays. However, we would prefer not to have to carry large screens and keyboards around.

Special glasses have been suggested as a solution for providing a usable display that is also portable. The "modes of interaction" framework implies that these intelligent glasses need as a companion a set of intelligent gloves. Imagine a set of gloves that respond to the motions and tapping of your fingers. Using those gloves to control what is displayed on your glasses would stay within the eye-hand mode of interaction.

For background noise devices, the ultimate enhancement would be a device that senses the mood that you are looking for, and chooses a song or TV program based on that sense. Right now, commercial establishments program background music to encourage us to buy, or to relax (in a dentist's office), or to chew faster in a fast-food restaurant (according to Douglas Rushkoff in his book Coercion). Devices aimed at consumers would allow them to determine the mood of their personal background music.

As inventors, new ventures, and established companies develop innovations, you might want to keep in mind that although the technology can handle convergence, humans may not be ready. You might want to bet on the innovations that stick to one mode of interaction, and bet against those that try to combine modes.

